research article path-counting formulas for generalized...

21
Research Article Path-Counting Formulas for Generalized Kinship Coefficients and Condensed Identity Coefficients En Cheng 1 and Z. Meral Ozsoyoglu 2 1 Computer Science Department, e University of Akron, Akron, OH 44325, USA 2 Electrical Engineering and Computer Science Department, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH 44106, USA Correspondence should be addressed to En Cheng; [email protected] Received 14 January 2014; Accepted 8 May 2014; Published 21 July 2014 Academic Editor: Zhenyu Jia Copyright © 2014 E. Cheng and Z. M. Ozsoyoglu. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. An important computation on pedigree data is the calculation of condensed identity coefficients, which provide a complete description of the degree of relatedness of two individuals. e applications of condensed identity coefficients range from genetic counseling to disease tracking. Condensed identity coefficients can be computed using linear combinations of generalized kinship coefficients for two, three, four individuals, and two pairs of individuals and there are recursive formulas for computing those generalized kinship coefficients (Karigl, 1981). Path-counting formulas have been proposed for the (generalized) kinship coefficients for two (three) individuals but there have been no path-counting formulas for the other generalized kinship coefficients. It has also been shown that the computation of the (generalized) kinship coefficients for two (three) individuals using path-counting formulas is efficient for large pedigrees, together with path encoding schemes tailored for pedigree graphs. In this paper, we propose a framework for deriving path-counting formulas for generalized kinship coefficients. en, we present the path-counting formulas for all generalized kinship coefficients for which there are recursive formulas and which are sufficient for computing condensed identity coefficients. We also perform experiments to compare the efficiency of our method with the recursive method for computing condensed identity coefficients on large pedigrees. 1. Introduction With the rapidly expanding field of medical genetics and genetic counseling, genealogy information is becoming increasingly abundant. In January 2009, the US Department of Health and Human Services released an updated and improved version of the Surgeon General’s Web-based family health history tool [1]. is Web-based tool makes it easy for users to record their family health history. Large extended human pedigrees are very informative for linkage analysis. Pedigrees including thousands of members in 10–20 gen- erations are available from genetically isolated populations [2, 3]. In human genetics, a pedigree is defined as “a simplified diagram of a family’s genealogy that shows family members’ relationships to each other and how a specific trait, abnormality, or disease has been inherited” [4]. Pedigrees are utilized to trace the inheritance of a specific disease, calculate genetic risk ratios, identify individuals at risk, and facilitate genetic counseling. To calculate genetic risk ratios or identify individuals at risk, we need to assess the degree of relatedness of two individuals. As a matter of fact, all measures of relatedness are based on the concept of identical by descent (IBD). Two alleles are identical by descent if one is an ancestral copy of the other or if they are both copies of the same ancestral allele. e IBD concept is primarily due to Cotterman [5] and Malecot [6] and has been successfully applied to many problems in population genetics. e simplest measure of relationship between two indi- viduals is their kinship coefficient. e kinship coefficient between two individuals and is the probability that an allele selected randomly from and an allele selected randomly from the same autosomal locus of are identical by descent. To better discriminate between different types of pairs of rel- atives, identity coefficients were introduced by Gillois [7] and Hindawi Publishing Corporation Computational and Mathematical Methods in Medicine Volume 2014, Article ID 898424, 20 pages http://dx.doi.org/10.1155/2014/898424

Upload: others

Post on 02-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

Research ArticlePath-Counting Formulas for Generalized Kinship Coefficientsand Condensed Identity Coefficients

En Cheng1 and Z Meral Ozsoyoglu2

1 Computer Science Department The University of Akron Akron OH 44325 USA2 Electrical Engineering and Computer Science Department Case Western Reserve University 10900 Euclid AvenueCleveland OH 44106 USA

Correspondence should be addressed to En Cheng echenguakronedu

Received 14 January 2014 Accepted 8 May 2014 Published 21 July 2014

Academic Editor Zhenyu Jia

Copyright copy 2014 E Cheng and Z M Ozsoyoglu This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

An important computation on pedigree data is the calculation of condensed identity coefficients which provide a completedescription of the degree of relatedness of two individuals The applications of condensed identity coefficients range from geneticcounseling to disease tracking Condensed identity coefficients can be computed using linear combinations of generalized kinshipcoefficients for two three four individuals and two pairs of individuals and there are recursive formulas for computing thosegeneralized kinship coefficients (Karigl 1981) Path-counting formulas have been proposed for the (generalized) kinship coefficientsfor two (three) individuals but there have been no path-counting formulas for the other generalized kinship coefficients It hasalso been shown that the computation of the (generalized) kinship coefficients for two (three) individuals using path-countingformulas is efficient for large pedigrees together with path encoding schemes tailored for pedigree graphs In this paper wepropose a framework for deriving path-counting formulas for generalized kinship coefficientsThen we present the path-countingformulas for all generalized kinship coefficients for which there are recursive formulas and which are sufficient for computingcondensed identity coefficients We also perform experiments to compare the efficiency of our method with the recursive methodfor computing condensed identity coefficients on large pedigrees

1 Introduction

With the rapidly expanding field of medical genetics andgenetic counseling genealogy information is becomingincreasingly abundant In January 2009 the US Departmentof Health and Human Services released an updated andimproved version of the Surgeon GeneralrsquosWeb-based familyhealth history tool [1] This Web-based tool makes it easy forusers to record their family health history Large extendedhuman pedigrees are very informative for linkage analysisPedigrees including thousands of members in 10ndash20 gen-erations are available from genetically isolated populations[2 3] In human genetics a pedigree is defined as ldquoasimplified diagram of a familyrsquos genealogy that shows familymembersrsquo relationships to each other and how a specific traitabnormality or disease has been inheritedrdquo [4] Pedigreesare utilized to trace the inheritance of a specific disease

calculate genetic risk ratios identify individuals at risk andfacilitate genetic counseling To calculate genetic risk ratiosor identify individuals at risk we need to assess the degreeof relatedness of two individuals As a matter of fact allmeasures of relatedness are based on the concept of identicalby descent (IBD) Two alleles are identical by descent if oneis an ancestral copy of the other or if they are both copies ofthe same ancestral allele The IBD concept is primarily dueto Cotterman [5] and Malecot [6] and has been successfullyapplied to many problems in population genetics

The simplest measure of relationship between two indi-viduals is their kinship coefficient The kinship coefficientbetween two individuals 119894 and 119895 is the probability that an alleleselected randomly from 119894 and an allele selected randomlyfrom the same autosomal locus of 119895 are identical by descentTo better discriminate between different types of pairs of rel-atives identity coefficients were introduced by Gillois [7] and

Hindawi Publishing CorporationComputational and Mathematical Methods in MedicineVolume 2014 Article ID 898424 20 pageshttpdxdoiorg1011552014898424

2 Computational and Mathematical Methods in Medicine

Harris [8] and promulgated by Jacquard [9] Considering thefour alleles of two individuals at a fixed autosomal locus thereare 15 possible identity states Disregarding the distinctionbetween maternally and paternally derived alleles we obtain9 condensed identity states The probabilities associated witheach condensed identity state are called condensed identitycoefficients which are useful in a diverse range of fields Thisincludes the calculation of risk ratios for qualitative diseasethe analysis of quantitative traits and genetic counseling inmedicine

A recursive algorithm for calculating condensed identitycoefficients proposed by Karigl [10] has been known forsome time This method requires that one calculates a setof generalized kinship coefficients from which one obtainscondensed identity coefficients via a linear transformationOne limitation is that this recursive approach is not scalablewhen applied to very large pedigrees It has been previouslyshown that the kinship coefficients for two individuals [11ndash13]and the generalized kinship coefficients for three individuals[14 15] can be efficiently calculated using path-countingformulas together with path encoding schemes tailored forpedigree graphs

Motivated by the efficiency of path-counting formulas forcomputing the kinship coefficient for two individuals andthe generalized kinship coefficient for three individuals wefirst introduce a framework for developing path-countingformulas to compute generalized kinship coefficients con-cerning three individuals four individuals and two pairs ofindividuals Then we present path-counting formulas for allgeneralized kinship coefficients which have recursive formu-las proposed by Karigl [10] and are sufficient to computecondensed identity coefficients In summary our ultimategoal is to use path-counting formulas for generalized kinshipcoefficients computation so that efficiency and scalability forcondensed identity coefficients calculation can be improved

The main contributions of our work are as follows

(i) a framework to develop path-counting formulas forgeneralized kinship coefficients

(ii) a set of path-counting formulas for all generalizedkinship coefficients having recursive formulas [10]

(iii) experimental results demonstrating significant per-formance gains for calculating condensed identitycoefficients based on our proposed path-countingformulas as compared to using recursive formulas[10]

2 Materials and Methods

This section describes kinship coefficients and generalizedkinship coefficients identity coefficients and condensedidentity coefficients in more detail Conceptual terms for thepath-counting formulas for three and four individuals areintroduced in Section 23 In addition an overview of path-counting formula derivation is presented

21 Kinship Coefficients and Generalized Kinship CoefficientsThe kinship coefficient between two individuals 119886 and 119887 is

the probability that a randomly chosen allele at the samelocus from each is identical by descent (IBD) There are twoapproaches to computing the kinship coefficient Φ

119886119887 the

recursive approach [10] and the path-counting approach [16]The recursive formulas [10] forΦ

119886119887and Φ

119886119886are

Φ119886119887=1

2(Φ119891119887+ Φ119898119887) if 119886 is not an ancestor of 119887

Φ119886119886=1

2(1 + Φ

119891119898) =

1

2(1 + 119865

119886)

(1)

where119891 and119898 denote the father and themother of 119886 respec-tively and 119865

119886is the inbreeding coefficient of 119886

Wrightrsquos path-counting formula [16] forΦ119886119887is

Φ119886119887= sum

119860

sum

⟨119875119860119886119875119860119887⟩isin119875119875

(1

2)

119903+119904+1

(1 + 119865119860) (2)

where 119860 is a common ancestor of 119886 and 119887 119875119875 is a set of non-overlapping path-pairs ⟨119875

119860119886 119875119860119887⟩ from 119860 to 119886 and 119887 119903 is the

length of the path 119875119860119886 119904 is the length of the path 119875

119860119887 and 119865

119860

is the inbreeding coefficient of 119860 The path-pair ⟨119875119860119886 119875119860119887⟩ is

nonoverlapping if and only if the two paths share no commonindividuals except 119860

Recursive formulas proposed by Karigl [10] for general-ized kinship coefficients concerning three individuals fourindividuals and two pairs of individuals are listed as followsin (3) (4) and (5)

Φ119886119887119888=1

2(Φ119891119887119888+ Φ119898119887119888)

if 119886 is not an ancestor of 119887 or 119888

Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887) if 119886 is not an ancestor of 119887

Φ119886119886119886

=1

4(1 + 3Φ

119891119898) =

1

4(1 + 3119865

119886)

(3)

Φ119886119887119888119889

=1

2(Φ119891119887119888119889

+ Φ119898119887119888119889

)

if 119886 is not an ancestor of 119887 or 119888 or 119889

Φ119886119886119887119888

=1

2(Φ119886119887119888+ Φ119891119898119887119888

)

if 119886 is not an ancestor of 119887 or 119888

Φ119886119886119886119887

=1

4(Φ119886119887+ 3Φ119891119898119887)

if 119886 is not an ancestor of 119887

Φ119886119886119886119886

=1

8(1 + 7Φ

119891119898) =

1

8(1 + 7119865

119886)

(4)

Computational and Mathematical Methods in Medicine 3

Φ119886119887119888119889

=1

2(Φ119891119887119888119889

+ Φ119898119887119888119889

)

if 119886 is not an ancestor of 119887 or 119888 or 119889

Φ119886119886119887119888

=1

2(Φ119887119888+ Φ119891119898119887119888

)

if 119886 is not an ancestor of 119887 or 119888

Φ119886119887119886119888

=1

4(2Φ119886119887119888+ Φ119891119887119898119888

+ Φ119898119887119891119888

)

if 119886 is not an ancestor of 119887 or 119888

Φ119886119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

if 119886 is not an ancestor of 119887

Φ119886119886119886119886

=1

4(1 + 3Φ

119891119898) =

1

4(1 + 3119865

119886)

(5)

Φ119886119887119888

is the probability that randomly chosen alleles atthe same locus from each of the three individuals (ie 119886 119887and 119888) are identical by descent (IBD) Similarly Φ

119886119887119888119889is the

probability that randomly chosen alleles at the same locusfrom each of the four individuals (ie 119886 119887 119888 and 119889) are IBDΦ119886119887119888119889

is the probability that a random allele from 119886 is IBDwith a random allele from 119887 and that a random allele from 119888

is IBD with a random allele from 119889 at the same locus Notethat Φ

119886119887119888= 0 if there is no common ancestor of 119886 119887 and 119888

Φ119886119887119888119889

= 0 if there is no common ancestor of 119886 119887 119888 and 119889 andΦ119886119887119888119889

= 0 in the absence of a common ancestor either for 119886and 119887 or for 119888 and 119889

22 Identity Coefficients and Condensed Identity CoefficientsGiven two individuals 119886 and 119887withmaternally and paternallyderived alleles at a fixed autosomal locus there are 15 possibleidentity states and the probabilities associated with eachidentity state are called identity coefficients Ignoring thedistinction betweenmaternally and paternally derived alleleswe categorize the 15 possible states to 9 condensed identitystates as shown in Figure 1 The states range from state 1in which all four alleles are IBD to state 9 in which noneof the four alleles are IBD The probabilities associated witheach condensed identity state are called condensed identitycoefficients denoted by Δ

119894| 1 le 119894 le 9 The condensed

identity coefficients can be computed based on generalizedkinship coefficients using the linear transformation shown asfollows in (6)

[[[[[[[[[[[[

[

1 1 1 1 1 1 1 1 1

2 2 2 2 1 1 1 1 1

2 2 1 1 2 2 1 1 1

4 0 2 0 2 0 2 1 0

8 0 4 0 2 0 2 1 0

8 0 2 0 4 0 2 1 0

16 0 4 0 4 0 2 1 0

4 4 2 2 2 2 1 1 1

16 0 4 0 4 0 4 1 0

]]]]]]]]]]]]

]

[[[[[[[[[[[[

[

Δ1

Δ2

Δ3

Δ4

Δ5

Δ6

Δ7

Δ8

Δ9

]]]]]]]]]]]]

]

=

[[[[[[[[[[[[

[

1

2Φ119886119886

2Φ119887119887

4Φ119886119887

8Φ119886119886119887

8Φ119886119887119887

16Φ119886119886119887119887

4Φ119886119886119887119887

16Φ119886119887119886119887

]]]]]]]]]]]]

]

(6)

In our work we focus on deriving the path-counting for-mulas for the generalized kinship coefficients includingΦ

119886119887119888

Φ119886119887119888119889

and Φ119886119887119888119889

23 Terms Defined for Path-Counting Formulas for Three andFour Individuals

(1) Triple-Common AncestorGiven three individuals 119886 119887 and119888 if119860 is a common ancestor of the three individuals then wecall 119860 a triple-common ancestor of 119886 119887 and 119888

(2) Quad-Common Ancestor Given four individuals 119886 119887 119888and 119889 if119860 is a common ancestor of the four individuals thenwe call 119860 a quad-common ancestor of 119886 119887 119888 and 119889

(3) 119875(119860 119886) It denotes the set of all possible paths from 119860 to119886 where the paths can only traverse edges in the direction ofparent to child such that 119875(119860 119886) = 119873119880119871119871 if and only if 119860 isan ancestor of 119886 119875

119860119886denotes a particular path from 119860 to 119886

where 119875119860119886isin 119875(119860 119886)

(4) Path-Pair It consists of two paths denoted as ⟨119875119860119886 119875119860119887⟩

where 119875119860119886isin 119875(119860 119886) and 119875

119860119887isin 119875(119860 119887)

(5) Nonoverlapping Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩

it is nonoverlapping if and only if the two paths share nocommon individuals except 119860

(6) Path-Triple It consists of three paths denoted as ⟨119875119860119886 119875119860119887

119875119860119888⟩ where 119875

119860119886isin 119875(119860 119886) 119875

119860119887isin 119875(119860 119887) and 119875

119860119888isin 119875(119860 119888)

(7) Path-Quad It consists of four paths denoted as ⟨119875119860119886 119875119860119887

119875119860119888 119875119860119889⟩ where 119875

119860119886isin 119875(119860 119886) 119875

119860119887isin 119875(119860 119887) 119875

119860119888isin 119875(119860 119888)

and 119875119860119889isin 119875(119860 119889)

(8) 119861119894 119862(119875119860119886 119875119860119887) It denotes all common individuals shared

between 119875119860119886

and 119875119860119887 except 119860

(9) 119879119903119894 119862(119875119860119886 119875119860119887 119875119860119888) It denotes all common individuals

shared among 119875119860119886 119875119860119887 and 119875

119860119888 except 119860

(10)119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) It denotes all common indi-

viduals shared among 119875119860119886 119875119860119887 119875119860119888 and 119875

119860119889 except 119860

(11) Crossover and 2-Overlap Individual If 119904 isin 119861119894 119862(119875119860119886 119875119860119887)

we call 119904 a crossover individual with respect to 119875119860119886

and 119875119860119887

ifthe two paths pass through different parents of 119904 On the otherhand if 119875

119860119886and 119875

119860119887pass through the same parent of 119904 then

we call 119904 a 2-overlap individual with respect to 119875119860119886

and 119875119860119887

(12) 3-Overlap Individual If 119904 isin 119879119903119894 119862(119875119860119886 119875119860119887 119875119860119888) and the

three paths 119875119860119886 119875119860119887 and 119875

119860119888pass through the same parent

of 119904 then we call 119904 a 3-overlap individual with respect to 119875119860119886

119875119860119887 and 119875

119860119888

(13) 2-Overlap Path If 119904 is a 2-overlap individual with respectto 119875119860119886

and 119875119860119887 then both 119875

119860119886and 119875119860119887

pass through the sameparent of 119904 denoted by 119901 and the edge from 119901 to 119904 is called anoverlap edge All consecutive overlap edges constitute a pathand this path is called a 2-overlap path If the 2-overlap path

4 Computational and Mathematical Methods in Medicine

Mat

erna

lPa

tern

al

Δ1 Δ2 Δ3 Δ4 Δ5 Δ6 Δ7 Δ8 Δ9

arsquos allelesbrsquos alleles

Figure 1 The 15 possible identity states for individuals 119886 and 119887 grouped by their 9 condensed states Lines indicate alleles that are IBD

A

c s d

e f

t

a b

Non-overlapping path-pair

Three independent paths

t is a crossover individual

and the overlap path is a root 2-overlap path

t is a 2-overlap individual and e is acrossover individual

t is a crossover individual s is a 2-overlapindividual and the overlap path is a root 2-overlap path

overlap individuals and the overlap path is a root 2-overlap path

e is a crossover individual t is a 2-overlapindividual and the overlap path is not a root 2-overlap path c is a 2-overlap individual and theoverlap path is a root 2-overlap path

Path-triple6

t is a crossover individual

s e t are 2-overlap individuals

c is a 3-overlap individual and e t are 2-

A rarr s rarr e rarr t rarr aA rarr s rarr e rarr t rarr b

A rarr s rarr e rarr t rarr aA rarr drarr b

A rarr s rarr e rarr t rarr aA rarrA rarr c

A rarr c

A rarr c

Path-pair1

Path-pair2

A rarr d rarr f rarr t rarr bA rarr s rarr e rarr t rarr a

A rarr s rarr e rarr t rarr aA rarr s rarr e rarr t rarr b

d rarr f

A rarr s rarr e rarr t rarr aA rarr d rarr f rarr t rarr b

A rarr c rarr t rarr e rarr aA rarr d rarr f rarr t rarr b

A rarr s rarr e rarr t rarr aA rarr s rarr f rarr t rarr bA rarr c

A rarr c rarr e rarr t rarr aA rarr c rarr e rarr t rarr bA rarr c

A rarr c rarr e rarr t rarr aA rarr c rarr e rarr t rarr bA rarr c

Path-triple1

Path-triple2

Path-triple3

Path-triple4

Path-pair3

Path-pair4

Path-triple5

s e t are 2-overlap individualswhere

where

where

where

where

where

where

where

Figure 2 Examples of path-pairs and path-triples

extends all theway to the ancestor119860 we call it a root 2-overlappath

(14) 3-Overlap PathIt consists of all 3-overlap individuals ina consecutive order If the 3-overlap path extends all the wayto the root 119860 we call it a root 3-overlap path

Example 1 Consider the path-pairs from 119860 to 119886 and 119887 inFigure 2 where119860 is a common ancestor of 119886 and 119887 For path-pair1 119861119894 119862(119875

119860119886 119875119860119887) = 119904 119890 119905 and 119860 rarr 119904 rarr 119890 rarr 119905 is

a root 2-overlap path with respect to 119875119860119886

and 119875119860119887 For path-

pair4 119861119894 119862(119875119860119886 119875119860119887) = 119890 119905 where 119890 is a crossover indi-

vidual 119905 is a 2-overlap individual with respect to 119875119860119886

and 119875119860119887

and 119890 rarr 119905 is a root 2-overlap path with respect to 119875119860119886

and119875119860119887

Example 2 There are four path-quads listed in Figure 3 from119860 to four individuals 119886 119887 119888 and 119889 where 119860 is a quad-common ancestor of the four individuals For path-quad2considering the paths 119875

119860119886and 119875119860119887 the path119860 rarr 119905 rarr 119891 rarr

119904 is a root 2-overlap path 119905 119891 119904 are 2-overlap individualswithrespect to 119875

119860119886and 119875

119860119887 For path-quad3 119905 119891 119904 are 3-overlap

individuals with respect to 119875119860119886 119875119860119887 and 119875

119860119888 and the path

119860 rarr 119905 rarr 119891 rarr 119904 is a root 3-overlap path

Then we summarize all the conceptual terms used in thepath-counting formulas for two individuals three individu-als and four individuals in Table 1 which reveals a glimpse ofour framework for generalizingWrightrsquos formula to three andfour individuals from terminology aspect

24 An Overview of Path-Counting Formula DerivationAccording to Wrightrsquos path-counting formula [16] (see (2))for two individuals 119886 and 119887 the path-counting approachrequires identifying common ancestors of 119886 and 119887 andcalculating the contribution of each common ancestor toΦ119886119887 More specifically for each common ancestor denoted

as 119860 we obtain all path-pairs from 119860 to 119886 and 119887

and identify acceptable path-pairs For Φ119886119887 an acceptable

path-pair ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair where

Computational and Mathematical Methods in Medicine 5

A

c

s

dt

f

ba

m

Path-quad1

Path-quad2

Path-quad3

Path-quad4

A rarr cA rarr d

A rarr t rarr f rarr s rarr aA rarr m rarr s rarr b

A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr cA rarr d

A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr t rarr f rarr s rarr cA rarr d

A rarr t rarr f rarr s rarr aA rarr t rarr m rarr s rarr bA rarr t rarr m rarr s rarr cA rarr d

Figure 3 Examples of path-quads

Table 1 The conceptual terms used for two three and four individuals

Two individuals Three individuals Four individualsCommon ancestor Triple-common ancestor Quad-common ancestorPath-pair Path-triple Path-quad119861119894 119862(119875

119860119886 119875119860119887) 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) 119876119906119886119889 119862(119875

119860119886 119875119860119887 119875119860119888 119875119860119889)

NA 2-Overlap individual 3-Overlap individualNA 2-Overlap path 3-Overlap pathNA Root 2-overlap path Root 3-overlap pathNA Crossover individual Crossover individual

the two paths share no common individuals except 119860 InFigure 2 path-pair2 is an acceptable path-pair while path-pair1 path-pair3 and path-pair4 are not acceptable path-pairs The contribution of each common ancestor 119860 toΦ

119886119887is

computed based on the inbreeding coefficient of 119860 modifiedby the length of each acceptable path-pair

To compute Φ119886119887119888

the path-counting approach requiresidentifying all triple-common ancestors of 119886 119887 and 119888 andsumming up all triple-common ancestorsrsquo contributions toΦ119886119887119888

For each triple-common ancestor denoted as119860 we firstidentify all path-triples each of which consists of three pathsfrom 119860 to 119886 119887 and 119888 respectively Some examples of path-triples are presented in Figure 2

For Φ119886119887 only nonoverlapping path-pairs are acceptable

A path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three path-pairs

⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩ For Φ

119886119887119888 a path-triple

might be acceptable even though either 2-overlap individualsor crossover individuals exist between a path-pair Themain challenge we need to address is finding necessary andsufficient conditions for acceptable path-triples

Aiming at solving the problem of identifying acceptablepath-triples we first use a systematic method to generate allpossible cases for a path-pair by considering different types ofcommon individuals shared between the two pathsThen weintroduce building blocks which are connected graphs withconditions on every edge in the graph that encapsulates a

set of acceptable cases of path-pairs In each building blockwe represent paths as nodes and interactions (ie sharedcommon individuals between two paths) as edges There areat least two paths in a building block For each buildingblock we obtain all acceptable cases for concerned path-pairs Given a path-triple it can be decomposed to one ormultiple building blocks Considering a shared path-pairbetween two building blocks we use the natural join operatorfrom relational algebra to match the acceptable cases forthe shared path-pair between two building blocks In otherwords considering the acceptable cases for building blocksas inputs we use the natural join operator to construct allacceptable cases for a path-triple Acceptable cases for a path-triple are identified and then used in deriving the path-counting formula forΦ

119886119887119888

Then we summarize all the main procedures used forderiving the path-counting formula for Φ

119886119887119888in a flowchart

shown in Figure 4 The main procedures are also applicablefor deriving the path-counting formulas forΦ

119886119887119888119889andΦ

119886119887119888119889

3 Results and Discussion

31 Path-Counting Formulas for Three Individuals We firstintroduce a systematic method to generate all possible cases

6 Computational and Mathematical Methods in Medicine

Path-pair

Path-triple Path-pair levelrepresentation Decomposition A set of

building blocksSets of acceptable casesFor each building block

Acceptable cases forpath-triple Natural join

If path-pair hascrossover

No

No

Yes

Yes

Split operator

Path-triple belongs toType 2

Type 1

If path-pair hasroot overlap

Compute its contributionto Φabc

Path-triple belongs to

⟨PAa PAb⟩Generate all cases for Identify nonoverlap path-

Pairs for ⟨PAa PAb⟩Compute its contribution

to Φab

Identify acceptable cases⟨PAa PAb⟩ in thefor

context of a path-triple

Aa PAb PAc ⟩⟨P

⟨PAa PAb⟩

Figure 4 A flowchart for path-counting formula derivation

for a path-pair Then we discuss building blocks for path-triples and identify all acceptable cases which are used inderiving the path-counting formula forΦ

119886119887119888

311 Cases for a Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩with

119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 where 119860 is a common ancestor of 119886

and 119887 and 119861119894 119862(119875119860119886 119875119860119887) consists of all common individuals

shared between 119875119860119886

and 119875119860119887 except 119860 we introduce three

patterns (ie crossover 2-overlap and root 2-overlap) to gen-erate all possible cases for ⟨119875

119860119886 119875119860119887⟩

(1) 119883(119875119860119886 119875119860119887) 119875119860119886

and 119875119860119887

share one or multiple cross-over individuals

(2) 119879(119875119860119886 119875119860119887) 119875119860119886

and 119875119860119887

are root 2-overlapping from119860 and the root 2-overlap path can have one or multi-ple 2-overlap individuals

(3) 119884(119875119860119886 119875119860119887)119875119860119886

and119875119860119887

are overlapping but not from119860 and the 2-overlap path can have one or multiple 2-overlap individuals

Based on the three patterns 119883(119875119860119886 119875119860119887) 119879(119875

119860119886 119875119860119887)

and 119884(119875119860119886 119875119860119887) we use regular expressions to generate all

possible cases for the path-pair ⟨119875119860119886 119875119860119887⟩ For convenience

we drop ⟨119875119860119886 119875119860119887⟩ and use 119883119879 and 119884 instead of patterns

119883(119875119860119886 119875119860119887) 119879(119875

119860119886 119875119860119887) and 119884(119875

119860119886 119875119860119887) whenever there is

no confusion When 119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 the eight cases

shown in (7) cover all possible cases for ⟨119875119860119886 119875119860119887⟩ The com-

pleteness of eight cases shown in (7) for ⟨119875119860119886 119875119860119887⟩ can be

proved by induction on the total number of 119879 119883 and 119884appearing in ⟨119875

119860119886 119875119860119887⟩ Using the pedigree in Figure 2 Cases

1ndash3 and Case 6 are illustrated in (8) (9) (10) and (11)

Case 1 119879Case 2 119883+

Case 3 119879119883+

Case 4 119879(119883+119884)+

Case 5 119879(119883+119884)+119883+

Case 6 119883+119884Case 7 119883+(119884119883+)+

Case 8 119883+(119884119883+)+119884

(7)

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119879 (8)

Computational and Mathematical Methods in Medicine 7

S0 S1 S2 S3

PAa PAb

PAc

Figure 5 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888⟩

where 119904 119890 119905 are 2-overlap individuals and the overlap pathis a root 2-overlap path

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119879119883 (9)

where 119904 is a 2-overlap individual and the overlap path is a root2-overlap path 119905 is a crossover individual

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119889 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119883 (10)

where 119905 is a crossover individual

119860 997888rarr 119888 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119883119884 (11)

where 119890 is a crossover individual 119905 is a 2-overlap individualand the overlap path is a 2-overlap path

312 Path-Pair Level Graphical Representation of a Path-Tri-ple Given a path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ we represent each

path as a node The path-triple can be decomposed to threepath-pairs (ie ⟨119875

119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩) For

each path-pair if the two paths share at least one commonindividual (ie either 2-overlap individual or crossover indi-vidual) except119860 then there is an edge between the two nodesrepresenting the two paths Therefore we obtain four differ-ent scenarios 119878

0ndash1198783 shown in Figure 5

In Figure 5 the scenario 1198780has no edges so it means

that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent paths In

Figure 2 path-triple1 is an example of 1198780 Next we introduce

a lemma which can assist with identifying the options for theedges in the scenarios 119878

1ndash1198783

Lemma 3 Given a path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consider the

three path-pairs ⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩ if there

is a 2-overlap edge which is represented by 119884 in regular expres-sion representation of any of the three path-pairs and then thepath-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ has no contribution to Φ

119886119887119888

Proof In [17] Nadot and Vaysseix proposed from a geneticand biological point of view that Φ

119886119887119888can be evaluated by

enumerating all eligible inheritance paths at allele-level start-ing from a triple common ancestor119860 to the three individuals119886 119887 and 119888

p1

p3

A

b c

a

p2

p5

p8

p4

p7

p6

(a) Pedigree

A

b c

a

p5

p7

p4

p6

p8

p1 p2

p3

(b) Inheritance paths

Figure 6 Examples of pedigree and inheritance paths

For the pedigree in Figure 6 let us consider the path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ listed as follows 119875

119860119886 119860 rarr 119886 119875

119860119887

119860 rarr 1199013rarr 1199016rarr 1199017rarr 119887 119875

119860119888 119860 rarr 119901

4rarr 1199016rarr

1199017rarr 119888For ⟨119875

119860119887 119875119860119888⟩ 1199016is a crossover individual 119901

7is an over-

lap individual and 1199016rarr 1199017is a 2-overlap edge repre-sented

by 119884 in regular expression representation (see the definitionfor 119884 in Section 311)

For the individual 1199016 let us denote the two alleles at one

fixed autosomal locus as 1198921and 119892

2 At allele-level only one

allele can be passed down from 1199016to 1199017 Since 119901

3and 119901

4

are parents of 1199016 1198921is passed down from one parent and

1198922is passed down from the other parent It is infeasible to

pass down both 1198921and 119892

2from 119901

6to 1199017 In other words

there are no corresponding inheritance paths for the path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩with a 2-overlap edge between ⟨119875

119860119887 119875119860119888⟩

(ie Case 6119883119884) Therefore such kind of path-triples has nocontribution toΦ

119886119887119888

Figure 6(b) shows one example of eligible inheritancepaths corresponding to a pedigree graph Each individual isrepresented by two allele nodesThe eligible inheritance pathsin Figure 6(b) consist of red edges only

Only Case 1 Case 2 and Case 3 do not have 119884 in theregular expression representation of a path-pair (see (7))considering the scenarios 119878

1ndash1198783shown in Figure 5 an edge

can have three options Case 1 119879Case 2 119883Case 3 119879119883

313 Constructing Cases for a Path-Triple For the scenarios1198781ndash1198783in Figure 5 we define two building blocks 119861

1 1198612

along with some rules in Figure 7 to generate acceptablecases For 119861

1 the edge can have three options Case 1 119879

Case 2 119883 Case 3 119879119883 For 1198612 we cannot allow both edges

to be root overlap because if two edges are root overlap then

8 Computational and Mathematical Methods in Medicine

For B2 there can be at most one edge belonging to root overlap (either T or TX)

PAa PAa

PAb PAb PAc

B1 B2

For B1 the edge can have three options case 1 T case 2 X case 3 TX

Figure 7 Building blocks 1198611 1198612 and basic rules

Note Ri denotes all acceptable path-triples for ui

S3e1

T3 = R1 ⋈ R2 ⋈ R3u1 u2 u3

e2 e2 e2

e3e3 e3e1 e1

Figure 8 A graphical illustration for obtaining 1198793

119875119860119886

and 119875119860119888

must share at least one com-mon individualexcept 119860 which contradicts the fact that 119875

119860119886and 119875

119860119888have

no edgeNext we focus on generating all acceptable cases for the

scenarios 1198781ndash1198783in Figure 5 where only 119878

3contains more

than one building block In order to leverage the dependencyamong building blocks we decompose 119878

3to 1198783= 1199061= 1198612

1199062= 1198612 1199063= 1198612 shown in Figure 8 For each 119906

119894 we have a

set of acceptable path-triples denoted as 119877119894

Considering the dependency among 1198771 1198772 1198773 we use

the natural join operator denoted as ⋈ operating on 1198771

1198772 1198773 to generate all acceptable cases for 119878

3 As a result we

obtain 1198793= 1198771⋈ 1198772⋈ 1198773 where 119879

3denotes the acceptable

cases of the path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in the scenario 119878

3

For each scenario in Figure 5 we generate all acceptablecases for ⟨119875

119860119886 119875119860119887 119875119860119888⟩ The scenario 119878

0has no edges and

it shows that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent

paths while for the other scenarios 119878119896(119896 = 1 2 3) the 119896

edges can have two options

(1) all 119896 edges belong to crossover or(2) one edge belongs to root 2-overlap the remaining (119896minus

1) edges belong to crossover

In summary acceptable path-triples can have at most oneroot 2-overlap path any number of crossover individuals butzero 2-overlap path

314 Splitting Operator Considering the existence of root2-overlap path and crossover in acceptable path-triples wepropose a splitting operator to transform a path-triple withcrossover individuals to a noncrossover path-triple withoutchanging the contribution from this path-triple to Φ

119886119887119888 The

main purpose of using the splitting operator is to simplifythe path-counting formula derivation process We first usean example in Figure 9 to illustrate how the splitting operator

works In Figure 9 there is a crossover individual 119904 between119875119860119886

and 119875119860119887

in the path triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in 119866

119896+1 The

splitting operator proceeds as follows

(1) split the node 119904 to two nodes 1199041and 1199042

(2) transform the edges 119904 rarr 1198861015840 and 119904 rarr 119887

1015840 to 1199041rarr 1198861015840

and 1199042rarr 1198871015840 respectively

(3) add two new edges 1199042rarr 1198861015840 and 119904

1rarr 1198871015840

Lemma 4 Given a pedigree graph 119866119896+1

having (119896 + 1)

crossover individuals regarding ⟨119875119860119886 119875119860119887 119875119860119888⟩ shown in

Figure 9 let 119904 denote the lowest crossover individual where nodescendant of 119904 can be a crossover individual among the threepaths119875

119860119886119875119860119887 and119875

119860119888 After using the splitting operator for the

lowest crossover individual 119904 in119866119896+1 the number of crossover

individuals in 119866119896+1

is decreased by 1

Proof The splitting operator only affects the edges from 119904 to1198861015840 and 1198871015840 If there is a new crossover node appearing the only

possible node is either 1198861015840 or 1198871015840 Assume 1198871015840 becomes a cross-over individual it means that 1198871015840 is able to reach 119886 and 119887 fromtwo separate paths It contradicts the fact that 119904 is the lowestcrossover individual between 119875

119860119886and 119875

119860119887

Next we introduce a canonical graph which results fromapplying the splitting operator for all crossover individualsThe canonical graph has zero crossover individual

Definition 5 (Canonical Graph) Given a pedigree graph 119866having one or more crossover individuals regarding Φ

119886119887119888 If

there exists a graph 1198661015840 which has no crossover individualswith regards to Φ

119886119887119888such that

(i) any acceptable path-triple in 119866 has an acceptablepath-triple in 1198661015840 which has the same contribution toΦ119886119887119888

as the one in 119866 forΦ119886119887119888

(ii) any acceptable path-triple in 1198661015840 has an acceptablepath-triple in 119866 which and has the same contributionto Φ119886119887119888

as the one in 1198661015840 forΦ119886119887119888

We call 1198661015840 a canonical graph of 119866 regardingΦ119886119887119888

Lemma 6 For a pedigree graph 119866 having one or morecrossover individuals regarding ⟨119875

119860119886 119875119860119887 119875119860119888⟩ there exists a

canonical graph 1198661015840 for 119866

Computational and Mathematical Methods in Medicine 9

Ancestor-descendant relationshipParent-child relationship

a998400 b

a b a b

998400 a998400 b998400

s1 s2

A A

x w c x w c

s For Gk+1 ⟨P ⟩ = PAa PAb PAc

⟨P ⟩ = PAa PAb PAcFor Gk

Gk+1 k + 1 crossover Gk k crossover

A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b

A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b

A rarr c

A rarr c

Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866

119896having 119896 crossover

S0

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

PAa PAd

PAb PAc

Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Proof (Sketch) The proof is by induction on the number ofcrossover individuals

Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866

In the induction step let119866119896+1

be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875

119860119886and

119875119860119887

in 119866119896+1

We apply the splitting operator on 119904 in 119866119896+1

andobtain 119866

119896having 119896 crossovers by Lemma 4

315 Path-Counting Formula for Φ119886119887119888

Now we present thepath-counting formula forΦ

119886119887119888

Φ119886119887119888= sum

119860

( sum

Type 1(1

2)

119871 triple

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple+1

Φ119860119860)

(12)

where Φ119860119860= (12)(1 + 119865

119860) Φ119860119860119860

= (14)(1 + 3119865119860) 119865119860 the

inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875

119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type

2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875

119860119904ending at

the individual 119904

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 2(13)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119886

119875119860119888 and 119875

119860119904)

For completeness the path-counting formula for Φ119886119886119887

isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B

32 Path-Counting Formulas for Four Individuals

321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and

119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11

scenarios 1198780ndash11987810shown in Figure 10 where all four paths are

considered symmetricallyIn Figure 11 we introduce three building blocks 119861

1

1198612 1198613 For 119861

1and 119861

2 the rules presented in Figure 7 are also

applicable for Figure 11 For1198613 we only consider root overlap

because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the

scenario 1198783in Figure 8 Therefore we only need to consider

1198613when 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0

322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ For a scenario 119878

119894(0 le 119894 le 10) in Figure 11 we

first decompose 119878119894to one or multiple building blocks For a

scenario 119878119894isin 1198781 1198783 it has only one building block and

all acceptable cases can be obtained directly For 1198782= 1199061=

1198611 1199062= 1198611 there is no need to consider the conflict between

the edges in 1199061and 119906

2because 119906

1and 119906

2are disconnected

Let 119877119894denote all acceptable cases of the path-pairs in 119906

119894 and

let 119879119894denote all acceptable cases for 119878

119894 Therefore we obtain

1198792= 1198771times1198772where times denotes the Cartesian product operator

from relational algebra

10 Computational and Mathematical Methods in Medicine

For B3 all three edges belong to root overlap (ie having root 3-overlap)

PAa

PAb PAcPAb

PAa

C(PAa PAb PAc) ne

B1 B2 B3

Tri 0

Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)

119878119894

1198784

1198785

1198787

1198788

1198789

11987810

119878119895

1198783

1198783

1198786

1198785

1198787

1198789

For 1198786= 1199061= 1198613 we obtain 119879

6= 1198771 For 119878

119894isin 119878119894| 4 le

119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based

on which we construct 119879119894

Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le

10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878

119895 is

defined as follows

(1) 119878119895is a proper subgraph of 119878

119894

(2) if 119878119894contains 119861

3 then 119878

119895must also contain 119861

3

(3) no such 119878119896exists that 119878

119895is a proper subgraph of 119878

119896

while 119878119896is also a proper subgraph of 119878

119894

For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the

largest subgraph of 119878119894 denoted as 119878

119895 in Table 2

For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878

119894 119878119895)

denote the set of building blocks in 119878119894but not in 119878

119895 where 119878

119895is

the largest subgraph of 119878119894 Let |119864

119894| and |119864

119895| denote the number

of edges in 119878119894and 119878

119895 respectively According to Table 2 we

can conclude that |119864119894| minus |119864

119895| = 1 In order to leverage the

dependency among building blocks we consider only 1198612in

Diff(119878119894119878119895) For example Diff(119878

51198783) = 119861

2 Let119879

3denote all

acceptable cases for 1198783 And let119877

1denote the set of acceptable

cases for Diff(1198785 1198783) Then we can use 119878

3and Diff(119878

5

1198783) to construct all acceptable cases for 119878

5 Then we apply

this idea for constructing all acceptable cases for each 119878119894in

Table 2Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case

has the following properties

(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path

(2) otherwise there can be at most two root 2-overlappaths

323 Path-Counting Formula forΦ119886119887119888119889

Now we present thepath-counting formula forΦ

119886119887119888119889as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

119871quad

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad+1

Φ119860119860119860

+ sum

Type 3(1

2)

119871quad+2

Φ119860119860)

(14)

where Φ119860119860= (12)(1+119865

119860)Φ119860119860119860

= (14)(1+3119865119860)Φ119860119860119860119860

=

(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-

common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904

ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path119875119860119905

ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap

path 119875119860119905

ending at 119904 and 119905respectively

119871quad =

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119904

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860119905for Case 2 isin Type 3

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus119871119875119860119905minus 119871119875119860119904

for Case 3 isin Type 3(15)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119887

119875119860119888 119875119860119889 etc)

For completeness the path-counting formulas for Φ119886119886119887119888

and Φ119886119886119886119887

are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C

Computational and Mathematical Methods in Medicine 11

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

s t

da

A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d

(a)

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

x y

da

A rarr x rarr a

A rarr x rarr d

A rarr y rarr bA rarr y rarr c

(b)

Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889

33 Path-Counting Formulas for Two Pairs of Individuals

331 Terminology and Definitions

(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875

119878119886isin 119875(119878 119886) 119875

119878119887isin

119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875

119879119889isin 119875(119879 119889) 119878 is a common ancestor

of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889

(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875

119860119886 119875119860119887) (or

119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when

119875119860119886

and 119875119860119887

(or 119875119860119888

and 119875119860119889) pass through the same parent of

119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call

119903 a heter-overlap individual when 119875119860119894

and 119875119860119895

pass throughthe same parent of 119903

(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap

individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path

Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875

119860119886

and 119875119860119887

119905 is a homo-overlap individual between 119875119860119888

and 119875119860119889 And

119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875

119860119886and 119875

119860119889 119910 is a heter-

overlap individual between 119875119860119887

and 119875119860119888 And 119860 rarr 119909 and

119860 rarr 119910 are root heter-overlap paths

332 Path-Counting Formula for Φ119886119887119888119889

Now we presenta path-pair level graphical representation for ⟨(119875

119860119886 119875119860119887)

(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can

be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩

presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are

summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875

119860ℎ(ie the path 119875

119860ℎending at ℎ) and 119903

1

and 1199032are the last individuals of root heter-overlap paths 119875

1198601199031

and 1198751198601199032

respectivelyGiven a pedigree graph having one or multiple progeni-

tors 119901119894| 119894 gt 0 we define that the generation of a progenitor

Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

Zero root 2-overlap andzero root 3-overlap

Zero root homo-overlap and zero rootheter-overlap

One root 2-overlap path

One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap

Two root 2-overlap paths

Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps

One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

One root 2-overlap andone root 3-overlap

One root homo-overlap and two rootheter-overlaps and 119903

1= 1199032= ℎ

One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

119901119894is 0 denoted as gen(119901

119894) = 0 If an individual 119886 has only

one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1

The path-counting formula forΦ119886119887119888119889

is as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

1198712-pair

Φ119860119860119860

+ sum

Type 2(1

2)

1198712-pair+1

Φ119860119860119860

+ sum

Type 3(1

2)

1198712-pair+2

Φ119860119860

+ sum

Type 4(1

2)

1198712-pair+1

Φ119860119860)

+ sum

(119878119879)isinType 5(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

⟩+1

Φ119861119861

(16)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =

119860) there are four types (ieType 1 to Type 4)

12 Computational and Mathematical Methods in Medicine

S0S1 S2 S3 S4 S5 S6 S7

S8 S9 S10 S11 S12 S13 S14 S15 S16

PAa

PAdPAb

PAc

Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level

Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875

119860119903ending at 119903

Type 3

zero root homo-overlap and two rootheter-overlap 119875

1198601199031and1198751198601199032

ending at1199031and 1199032 respectively

one root homo-overlap 119875119860ℎ

ending at ℎand two root heter-overlap 119875

1198601199031and 119875

1198601199032

ending at 1199031and 1199032 and 119903

1= 1199032

(17)

Type 4 one root homo-overlap 119875119860ℎ

ending at ℎ andtwo root heter-overlap ending at 119903

1and 1199032 and ℎ =

1199031= 1199032 For ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is

one type (ie Type 5)Type 5 ⟨119875

119878119886 119875119878119887⟩ has zero overlap individual ⟨119875

119879119888

119875119879119889⟩ has zero overlap individual

At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875

119879119888

119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875

119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩

there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879

119861=

119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)

and 119879 has two parents119879 otherwise

1198712-pair =

119871119875119860119886+ 119871119875119860119887

+119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119903

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 1198711198751198601199031

minus 1198711198751198601199032

for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860ℎfor Type 4

119871⟨119875119878119886 119875119878119887⟩

= 119871119875119878119886+ 119871119875119878119887

for Type 5

119871⟨119875119879119888 119875119879119889⟩

= 119871119875119879119888+ 119871119875119879119889

for Type 5

(18)

Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ

119886119887119888119889

Φ119886119887119888119889

= sum

(119878119879)isinType 6(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

Φ119878119878lowast Φ119879119879 (19)

Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875

119879119888

119875119879119889⟩ is a nonoverlapping path-pair Between a path from

⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩ there are no overlap

individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩

and 119871⟨119875119879119888119875119879119889⟩

are defined as in Type 5The correctness of the path-counting formula forΦ

119886119887119888119889is

proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ

119886119886119887119888 Φ119886119887119886119888

Φ119886119887119886119887

andΦ119886119886119886119887

34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method

Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows

(1) If 119906 is 119903 then NC(119903) contains only one element theempty string

(2) Otherwise let 119906 be a node with NC(119906) and V0 V1

V119896be 119906rsquos children in sibling order then for each 119909

in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le

119894 le 119896 and lowast indicates the gender of the individualrepresented by node V

119894

Computational and Mathematical Methods in Medicine 13

Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33

In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group

We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement

In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients

In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree

Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130

4 Conclusion

We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients

0

50

100

150

200

77 181

383

769

1558

3105

6174

1235

1

2466

7

4976

1

9832

8

1951

97

250

300

Aver

age t

ime (

ms)

Individuals in pedigree

RecursiveNodecodes

Figure 14 The effect of pedigree size on computation efficiencyimprovement

0200400600800

10001200140016001800

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Aver

age t

ime (

ms)

Depth

RecursiveNodeCodes

Figure 15 The effect of depth on computation efficiency improve-ment

we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees

Appendices

A Path-Counting Formulas of Special Cases

A1 Path-Counting Formula for Φ119886119886119887

For ⟨1198751198601198861 1198751198601198862⟩ we

introduce a special case where 1198751198601198861

and 1198751198601198862

aremergeable

14 Computational and Mathematical Methods in Medicine

PAa1 PAa2 PAa1 PAa2

S0 S1

PAb PAb PAb

If is mergeable⟨P ⟩Aa1 PAa2

PAa

S2 S3

Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩

Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861

1198751198601198862⟩ is mergeable if and only if the two paths 119875

1198601198861and 119875

1198601198862

are completely identical

Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩ in Figure 16

Lemma A2 For 1198782and 119878

3in Figure 16 ⟨119875

1198601198861 1198751198601198862⟩ cannot

be a mergeable path-pair

Proof For 1198782and 119878

3 if ⟨119875

1198601198861 1198751198601198862⟩ is mergeable then

any common individual 119904 between 1198751198601198861

and 119875119860119887

is alsoa shared individual between 119875

1198601198862and 119875

119860119887 It means

119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that

119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0

Considering all three scenarios in Figure 16 only 1198781can

have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now

we present our path-counting formula forΦ119886119886119887

where 119886 is notan ancestor of 119887

Φ119886119886119887

= sum

119860

( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860

+ sum

Type 3(1

2)

119871⟨119875119860119886119875119860119887⟩+1

Φ119860119860)

(A1)

where 119860 a common ancestor of 119886 and 119887When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap

Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at the individual 119904

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair

119871 triple = 1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

for Type 3

(A2)

For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ

119886119886119887in [10] but we can use either

the recursive formula for Φ119886119887119888

or the path-counting formulaforΦ119886119887119888

to computeΦ11988611198862119887

A2 Path-Counting Formula for Φ119886119886119887119888

Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable then

we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888

119875119860119889⟩ If ⟨119875

1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875

1198601198861 1198751198601198862

119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875

119860119886 119875119860119887 119875119860119888⟩

Now we present a path-counting formula forΦ119886119886119887119888

where119886 is not an ancestor of 119887 and 119888 as follows

Φ119886119886119887119888

= sum

119860

( sum

Type 1(1

2)

119871quadminus1

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad

ΦAAA

+ sum

Type 3(1

2)

119871quad+1

Φ119860119860)

+sum

119860

( sum

Type 4(1

2)

119871 triple+1

Φ119860119860119860

+ sum

Type 5(1

2)

119871 triple+2

Φ119860119860)

(A3)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875

119860119904ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

and 1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path 119875119860119905

ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904

and 119875119860119905

ending at 119904 and 119905respectively

(A4)

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path

Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875

119860119904

ending at 119904

119871quad=

1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119904

for Type 21198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905

for Case 2isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905minus 119871119875119860119904

for Case 3isinType 3

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 5(A5)

Computational and Mathematical Methods in Medicine 15

Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ

119886119887119888119889is applicable

to computeΦ11988611198862119887119888

A3 Path-Counting Formula for Φ119886119886119886119887

A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced

when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of

a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-

densed to ⟨119875119860119886 119875119860119887⟩

Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861

1198751198601198862

and 1198751198601198863

they are mergeable if and only if theyare completely identical

Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there

must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩

⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩

Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one

mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and

⟨1198751198601198862 1198751198601198863⟩

For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable

path-pairNow we present the path-counting formula for Φ

119886119886119886119887

where 119886 is not an ancestor of 119887 as follows

Φ119886119886119886119887

= sum

119860

(3

2( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860)

+ sum

Type 3(1

2)

119871pair+2

Φ119860119860)

(A6)

where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-

sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)

Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path

Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at 119904

When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping

119871 triple = 1198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3

(A7)

Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887

=

Φ119886111988621198863119887

Then we apply the path-counting formula forΦ119886119887119888119889

to computeΦ119886111988621198863119887

Case21 Case31 ΦAAAΦabCase22 Case32

Case23 ΦAA

Figure 17 Dependency graph for different cases regardingΦ119886119887119888

andΦ119886119886119887

B Proof for Path-Counting Formulas ofThree Individuals

Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ

119886119887119888is equivalent to the

computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors

B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ

119886119887119888andΦ

119886119886119887119866 can

have 5 different cases

Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 23 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

having mergeablepath-pair⟨119875

1198601198861 1198751198601198862⟩

lArr997904 Φ119886119886119887

Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlapCase 32 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlap

lArr997904 Φ119886119887119888

(B1)

Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals

Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)

(i) forΦ119886119887 the correctness of the path-counting formula

(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32

(ii) for Case 23 it has no cycle but only depends on Φ119886119887

Thus we prove the correctness of Case 23 by trans-forming the case toΦ

119886119887

16 Computational and Mathematical Methods in Medicine

a b

c

(a)

A

a b c

(b)

Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another

Parent-child relationshipAncestor-descendant relationship

A

a

s v p

f b c

(a)

Parent-child relationshipAncestor-descendant relationship

c

a

s v

f b

(b)

Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887

(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866

B11 Correctness Proof for Case 31

Case 31 ForΦ119886119887119888

119866 does not have any path triples ⟨119875119860119886 119875119860119887

119875119860119888⟩ with root overlap

Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888

Using the recursive formula (3) to compute Φ119886119887119888

forFigure 18(a) Φ

119886119887119888= (12)Φ

119888119887119888= (12)

2

Φ119888119888119888 for Figure 18(b)

Φ119886119887119888= (12)Φ

119860119887119888= (12)

2

Φ119860119860119888

= (12)3

Φ119860119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the

contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

follows sumType 1(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

where 119871⟨119875119860119886119875119860119887 119875119860119888⟩

=

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

For Figure 18(a) 119888 is the only triple-common ancestor

and we obtain Φ119886119887119888

= (12)119871⟨119875119888119886119875119888119887

119875119888119888⟩Φ119888119888119888

= (12)2

Φ119888119888119888 for

Figure 18(b) we obtain Φ119886119887119888

= (12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

=

(12)3

Φ119860119860119860

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1

For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866

then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888

119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ

119891119887119888= sumType 1(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860

Based on the recursive formula (3)Φ

119886119887119888= (12)(Φ

119891119887119888+Φ119898119887119888)

where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ

119898119887119888= 0 Then we can plug-in the path-

counting formula forΦ119891119887119888

to obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2lowast sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩

Φ119860119860119860

= sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860119860

∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 1(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩

Φ119860119860119860

(B2)

Similarly for Figure 19(b) we obtain Φ119886119887119888

=

sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1

Φ119888119888119888= sumType 1(12)

119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888

Thus it is true for 119899 = 119896 + 1

B12 Correctness Proof for Case 32

Case 32 ForΦ119886119887119888

119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root

overlap

Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888

Computational and Mathematical Methods in Medicine 17

a

b

c

(a)

A

a

b c

(b)

A

a

s

b

c

(c)

Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another

Using the recursive formula (3) to compute Φ119886119887119888

inFigure 20 for Figure 20(a) Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)3

Φ119888119888 for Figure 20(b)Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)4

Φ119860119860

for Figure 20(c)Φ119886119887119888= (12)

2

Φ119904119904119888= (12)

3

Φ119904119888=

(12)5

Φ119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-

tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

followssumType 2(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩+1

Φ119860119860

where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

=

119871119875119860119886

+ 119871119875119860119887

+ 119871119875119860119888minus 119871119875119860119904

and 119904 is the last individual of theroot overlap path 119875

119860119904

For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ

119886119887119888= (12)

119871⟨119875119888119886119875119888119887119875119888119888⟩+1

Φ119888119888= (12)

2+1

Φ119888119888=

(12)3

Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain

Φ119886119887119888= (12)

4

Φ119860119860

and Φ119886119887119888= (12)

5

Φ119860119860

respectively

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1

For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ

119891119887119888in 119866lowast Φ

119891119887119888= sumType 2(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

In 119866 119891 is the only parent of 119886 according to the recursive

formula (3) we have Φ119886119887119888= (12)Φ

119891119887119888 Then we can plug-in

the Φ119891119887119888

and obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩+1

Φ119860119860

(B3)

For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ

119886119887119888for Figure 21(a)

In summary it is true for 119899 = 119896 + 1

A

a

s

t

f

b

c

(a)

a

t

b

A

s c

(b)

a

s

t

b

c

(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887

B13 Correctness Proof for Case 23

Case 23 For Φ119886119886119887

the path-triples in the pedigree graph 119866have mergeable path-pair

Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875

1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means

that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ

119886119886119887can be computed as follows

sumType 3(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

where 119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

Using the recursive formula (4) we obtain Φ

119886119886119887=

(12)(Φ119886119887+ Φ119891119898119887)

For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891

there4 Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

=1

2(Φ119886119887+ 0) =

1

2Φ119886119887

(as 119898 is missing) (B4)

For Φ119886119887 we use Wrightrsquos formula and obtain Φ

119886119887=

sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860

where 119875 denotes all nonoverlappingpath-pairs ⟨119875

119860119886 119875119860119887⟩

Then we have Φ119886119886119887

= (12)Φ119886119887

=

(12)sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

For Figure 22(b) we can also transform the computation

of Φ119886119886119887

to Φ119886119887

In summary it shows that the path-counting formula(A1) is true for Case 23

B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887

whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ

119886119886119887can be

transformed toΦ11988611198862119887

which is equivalent to the computationof Φ119886119887119888

for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ

119886119886119887when the path-triple belongs

to either Case 21 or Case 22

B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 2: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

2 Computational and Mathematical Methods in Medicine

Harris [8] and promulgated by Jacquard [9] Considering thefour alleles of two individuals at a fixed autosomal locus thereare 15 possible identity states Disregarding the distinctionbetween maternally and paternally derived alleles we obtain9 condensed identity states The probabilities associated witheach condensed identity state are called condensed identitycoefficients which are useful in a diverse range of fields Thisincludes the calculation of risk ratios for qualitative diseasethe analysis of quantitative traits and genetic counseling inmedicine

A recursive algorithm for calculating condensed identitycoefficients proposed by Karigl [10] has been known forsome time This method requires that one calculates a setof generalized kinship coefficients from which one obtainscondensed identity coefficients via a linear transformationOne limitation is that this recursive approach is not scalablewhen applied to very large pedigrees It has been previouslyshown that the kinship coefficients for two individuals [11ndash13]and the generalized kinship coefficients for three individuals[14 15] can be efficiently calculated using path-countingformulas together with path encoding schemes tailored forpedigree graphs

Motivated by the efficiency of path-counting formulas forcomputing the kinship coefficient for two individuals andthe generalized kinship coefficient for three individuals wefirst introduce a framework for developing path-countingformulas to compute generalized kinship coefficients con-cerning three individuals four individuals and two pairs ofindividuals Then we present path-counting formulas for allgeneralized kinship coefficients which have recursive formu-las proposed by Karigl [10] and are sufficient to computecondensed identity coefficients In summary our ultimategoal is to use path-counting formulas for generalized kinshipcoefficients computation so that efficiency and scalability forcondensed identity coefficients calculation can be improved

The main contributions of our work are as follows

(i) a framework to develop path-counting formulas forgeneralized kinship coefficients

(ii) a set of path-counting formulas for all generalizedkinship coefficients having recursive formulas [10]

(iii) experimental results demonstrating significant per-formance gains for calculating condensed identitycoefficients based on our proposed path-countingformulas as compared to using recursive formulas[10]

2 Materials and Methods

This section describes kinship coefficients and generalizedkinship coefficients identity coefficients and condensedidentity coefficients in more detail Conceptual terms for thepath-counting formulas for three and four individuals areintroduced in Section 23 In addition an overview of path-counting formula derivation is presented

21 Kinship Coefficients and Generalized Kinship CoefficientsThe kinship coefficient between two individuals 119886 and 119887 is

the probability that a randomly chosen allele at the samelocus from each is identical by descent (IBD) There are twoapproaches to computing the kinship coefficient Φ

119886119887 the

recursive approach [10] and the path-counting approach [16]The recursive formulas [10] forΦ

119886119887and Φ

119886119886are

Φ119886119887=1

2(Φ119891119887+ Φ119898119887) if 119886 is not an ancestor of 119887

Φ119886119886=1

2(1 + Φ

119891119898) =

1

2(1 + 119865

119886)

(1)

where119891 and119898 denote the father and themother of 119886 respec-tively and 119865

119886is the inbreeding coefficient of 119886

Wrightrsquos path-counting formula [16] forΦ119886119887is

Φ119886119887= sum

119860

sum

⟨119875119860119886119875119860119887⟩isin119875119875

(1

2)

119903+119904+1

(1 + 119865119860) (2)

where 119860 is a common ancestor of 119886 and 119887 119875119875 is a set of non-overlapping path-pairs ⟨119875

119860119886 119875119860119887⟩ from 119860 to 119886 and 119887 119903 is the

length of the path 119875119860119886 119904 is the length of the path 119875

119860119887 and 119865

119860

is the inbreeding coefficient of 119860 The path-pair ⟨119875119860119886 119875119860119887⟩ is

nonoverlapping if and only if the two paths share no commonindividuals except 119860

Recursive formulas proposed by Karigl [10] for general-ized kinship coefficients concerning three individuals fourindividuals and two pairs of individuals are listed as followsin (3) (4) and (5)

Φ119886119887119888=1

2(Φ119891119887119888+ Φ119898119887119888)

if 119886 is not an ancestor of 119887 or 119888

Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887) if 119886 is not an ancestor of 119887

Φ119886119886119886

=1

4(1 + 3Φ

119891119898) =

1

4(1 + 3119865

119886)

(3)

Φ119886119887119888119889

=1

2(Φ119891119887119888119889

+ Φ119898119887119888119889

)

if 119886 is not an ancestor of 119887 or 119888 or 119889

Φ119886119886119887119888

=1

2(Φ119886119887119888+ Φ119891119898119887119888

)

if 119886 is not an ancestor of 119887 or 119888

Φ119886119886119886119887

=1

4(Φ119886119887+ 3Φ119891119898119887)

if 119886 is not an ancestor of 119887

Φ119886119886119886119886

=1

8(1 + 7Φ

119891119898) =

1

8(1 + 7119865

119886)

(4)

Computational and Mathematical Methods in Medicine 3

Φ119886119887119888119889

=1

2(Φ119891119887119888119889

+ Φ119898119887119888119889

)

if 119886 is not an ancestor of 119887 or 119888 or 119889

Φ119886119886119887119888

=1

2(Φ119887119888+ Φ119891119898119887119888

)

if 119886 is not an ancestor of 119887 or 119888

Φ119886119887119886119888

=1

4(2Φ119886119887119888+ Φ119891119887119898119888

+ Φ119898119887119891119888

)

if 119886 is not an ancestor of 119887 or 119888

Φ119886119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

if 119886 is not an ancestor of 119887

Φ119886119886119886119886

=1

4(1 + 3Φ

119891119898) =

1

4(1 + 3119865

119886)

(5)

Φ119886119887119888

is the probability that randomly chosen alleles atthe same locus from each of the three individuals (ie 119886 119887and 119888) are identical by descent (IBD) Similarly Φ

119886119887119888119889is the

probability that randomly chosen alleles at the same locusfrom each of the four individuals (ie 119886 119887 119888 and 119889) are IBDΦ119886119887119888119889

is the probability that a random allele from 119886 is IBDwith a random allele from 119887 and that a random allele from 119888

is IBD with a random allele from 119889 at the same locus Notethat Φ

119886119887119888= 0 if there is no common ancestor of 119886 119887 and 119888

Φ119886119887119888119889

= 0 if there is no common ancestor of 119886 119887 119888 and 119889 andΦ119886119887119888119889

= 0 in the absence of a common ancestor either for 119886and 119887 or for 119888 and 119889

22 Identity Coefficients and Condensed Identity CoefficientsGiven two individuals 119886 and 119887withmaternally and paternallyderived alleles at a fixed autosomal locus there are 15 possibleidentity states and the probabilities associated with eachidentity state are called identity coefficients Ignoring thedistinction betweenmaternally and paternally derived alleleswe categorize the 15 possible states to 9 condensed identitystates as shown in Figure 1 The states range from state 1in which all four alleles are IBD to state 9 in which noneof the four alleles are IBD The probabilities associated witheach condensed identity state are called condensed identitycoefficients denoted by Δ

119894| 1 le 119894 le 9 The condensed

identity coefficients can be computed based on generalizedkinship coefficients using the linear transformation shown asfollows in (6)

[[[[[[[[[[[[

[

1 1 1 1 1 1 1 1 1

2 2 2 2 1 1 1 1 1

2 2 1 1 2 2 1 1 1

4 0 2 0 2 0 2 1 0

8 0 4 0 2 0 2 1 0

8 0 2 0 4 0 2 1 0

16 0 4 0 4 0 2 1 0

4 4 2 2 2 2 1 1 1

16 0 4 0 4 0 4 1 0

]]]]]]]]]]]]

]

[[[[[[[[[[[[

[

Δ1

Δ2

Δ3

Δ4

Δ5

Δ6

Δ7

Δ8

Δ9

]]]]]]]]]]]]

]

=

[[[[[[[[[[[[

[

1

2Φ119886119886

2Φ119887119887

4Φ119886119887

8Φ119886119886119887

8Φ119886119887119887

16Φ119886119886119887119887

4Φ119886119886119887119887

16Φ119886119887119886119887

]]]]]]]]]]]]

]

(6)

In our work we focus on deriving the path-counting for-mulas for the generalized kinship coefficients includingΦ

119886119887119888

Φ119886119887119888119889

and Φ119886119887119888119889

23 Terms Defined for Path-Counting Formulas for Three andFour Individuals

(1) Triple-Common AncestorGiven three individuals 119886 119887 and119888 if119860 is a common ancestor of the three individuals then wecall 119860 a triple-common ancestor of 119886 119887 and 119888

(2) Quad-Common Ancestor Given four individuals 119886 119887 119888and 119889 if119860 is a common ancestor of the four individuals thenwe call 119860 a quad-common ancestor of 119886 119887 119888 and 119889

(3) 119875(119860 119886) It denotes the set of all possible paths from 119860 to119886 where the paths can only traverse edges in the direction ofparent to child such that 119875(119860 119886) = 119873119880119871119871 if and only if 119860 isan ancestor of 119886 119875

119860119886denotes a particular path from 119860 to 119886

where 119875119860119886isin 119875(119860 119886)

(4) Path-Pair It consists of two paths denoted as ⟨119875119860119886 119875119860119887⟩

where 119875119860119886isin 119875(119860 119886) and 119875

119860119887isin 119875(119860 119887)

(5) Nonoverlapping Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩

it is nonoverlapping if and only if the two paths share nocommon individuals except 119860

(6) Path-Triple It consists of three paths denoted as ⟨119875119860119886 119875119860119887

119875119860119888⟩ where 119875

119860119886isin 119875(119860 119886) 119875

119860119887isin 119875(119860 119887) and 119875

119860119888isin 119875(119860 119888)

(7) Path-Quad It consists of four paths denoted as ⟨119875119860119886 119875119860119887

119875119860119888 119875119860119889⟩ where 119875

119860119886isin 119875(119860 119886) 119875

119860119887isin 119875(119860 119887) 119875

119860119888isin 119875(119860 119888)

and 119875119860119889isin 119875(119860 119889)

(8) 119861119894 119862(119875119860119886 119875119860119887) It denotes all common individuals shared

between 119875119860119886

and 119875119860119887 except 119860

(9) 119879119903119894 119862(119875119860119886 119875119860119887 119875119860119888) It denotes all common individuals

shared among 119875119860119886 119875119860119887 and 119875

119860119888 except 119860

(10)119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) It denotes all common indi-

viduals shared among 119875119860119886 119875119860119887 119875119860119888 and 119875

119860119889 except 119860

(11) Crossover and 2-Overlap Individual If 119904 isin 119861119894 119862(119875119860119886 119875119860119887)

we call 119904 a crossover individual with respect to 119875119860119886

and 119875119860119887

ifthe two paths pass through different parents of 119904 On the otherhand if 119875

119860119886and 119875

119860119887pass through the same parent of 119904 then

we call 119904 a 2-overlap individual with respect to 119875119860119886

and 119875119860119887

(12) 3-Overlap Individual If 119904 isin 119879119903119894 119862(119875119860119886 119875119860119887 119875119860119888) and the

three paths 119875119860119886 119875119860119887 and 119875

119860119888pass through the same parent

of 119904 then we call 119904 a 3-overlap individual with respect to 119875119860119886

119875119860119887 and 119875

119860119888

(13) 2-Overlap Path If 119904 is a 2-overlap individual with respectto 119875119860119886

and 119875119860119887 then both 119875

119860119886and 119875119860119887

pass through the sameparent of 119904 denoted by 119901 and the edge from 119901 to 119904 is called anoverlap edge All consecutive overlap edges constitute a pathand this path is called a 2-overlap path If the 2-overlap path

4 Computational and Mathematical Methods in Medicine

Mat

erna

lPa

tern

al

Δ1 Δ2 Δ3 Δ4 Δ5 Δ6 Δ7 Δ8 Δ9

arsquos allelesbrsquos alleles

Figure 1 The 15 possible identity states for individuals 119886 and 119887 grouped by their 9 condensed states Lines indicate alleles that are IBD

A

c s d

e f

t

a b

Non-overlapping path-pair

Three independent paths

t is a crossover individual

and the overlap path is a root 2-overlap path

t is a 2-overlap individual and e is acrossover individual

t is a crossover individual s is a 2-overlapindividual and the overlap path is a root 2-overlap path

overlap individuals and the overlap path is a root 2-overlap path

e is a crossover individual t is a 2-overlapindividual and the overlap path is not a root 2-overlap path c is a 2-overlap individual and theoverlap path is a root 2-overlap path

Path-triple6

t is a crossover individual

s e t are 2-overlap individuals

c is a 3-overlap individual and e t are 2-

A rarr s rarr e rarr t rarr aA rarr s rarr e rarr t rarr b

A rarr s rarr e rarr t rarr aA rarr drarr b

A rarr s rarr e rarr t rarr aA rarrA rarr c

A rarr c

A rarr c

Path-pair1

Path-pair2

A rarr d rarr f rarr t rarr bA rarr s rarr e rarr t rarr a

A rarr s rarr e rarr t rarr aA rarr s rarr e rarr t rarr b

d rarr f

A rarr s rarr e rarr t rarr aA rarr d rarr f rarr t rarr b

A rarr c rarr t rarr e rarr aA rarr d rarr f rarr t rarr b

A rarr s rarr e rarr t rarr aA rarr s rarr f rarr t rarr bA rarr c

A rarr c rarr e rarr t rarr aA rarr c rarr e rarr t rarr bA rarr c

A rarr c rarr e rarr t rarr aA rarr c rarr e rarr t rarr bA rarr c

Path-triple1

Path-triple2

Path-triple3

Path-triple4

Path-pair3

Path-pair4

Path-triple5

s e t are 2-overlap individualswhere

where

where

where

where

where

where

where

Figure 2 Examples of path-pairs and path-triples

extends all theway to the ancestor119860 we call it a root 2-overlappath

(14) 3-Overlap PathIt consists of all 3-overlap individuals ina consecutive order If the 3-overlap path extends all the wayto the root 119860 we call it a root 3-overlap path

Example 1 Consider the path-pairs from 119860 to 119886 and 119887 inFigure 2 where119860 is a common ancestor of 119886 and 119887 For path-pair1 119861119894 119862(119875

119860119886 119875119860119887) = 119904 119890 119905 and 119860 rarr 119904 rarr 119890 rarr 119905 is

a root 2-overlap path with respect to 119875119860119886

and 119875119860119887 For path-

pair4 119861119894 119862(119875119860119886 119875119860119887) = 119890 119905 where 119890 is a crossover indi-

vidual 119905 is a 2-overlap individual with respect to 119875119860119886

and 119875119860119887

and 119890 rarr 119905 is a root 2-overlap path with respect to 119875119860119886

and119875119860119887

Example 2 There are four path-quads listed in Figure 3 from119860 to four individuals 119886 119887 119888 and 119889 where 119860 is a quad-common ancestor of the four individuals For path-quad2considering the paths 119875

119860119886and 119875119860119887 the path119860 rarr 119905 rarr 119891 rarr

119904 is a root 2-overlap path 119905 119891 119904 are 2-overlap individualswithrespect to 119875

119860119886and 119875

119860119887 For path-quad3 119905 119891 119904 are 3-overlap

individuals with respect to 119875119860119886 119875119860119887 and 119875

119860119888 and the path

119860 rarr 119905 rarr 119891 rarr 119904 is a root 3-overlap path

Then we summarize all the conceptual terms used in thepath-counting formulas for two individuals three individu-als and four individuals in Table 1 which reveals a glimpse ofour framework for generalizingWrightrsquos formula to three andfour individuals from terminology aspect

24 An Overview of Path-Counting Formula DerivationAccording to Wrightrsquos path-counting formula [16] (see (2))for two individuals 119886 and 119887 the path-counting approachrequires identifying common ancestors of 119886 and 119887 andcalculating the contribution of each common ancestor toΦ119886119887 More specifically for each common ancestor denoted

as 119860 we obtain all path-pairs from 119860 to 119886 and 119887

and identify acceptable path-pairs For Φ119886119887 an acceptable

path-pair ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair where

Computational and Mathematical Methods in Medicine 5

A

c

s

dt

f

ba

m

Path-quad1

Path-quad2

Path-quad3

Path-quad4

A rarr cA rarr d

A rarr t rarr f rarr s rarr aA rarr m rarr s rarr b

A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr cA rarr d

A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr t rarr f rarr s rarr cA rarr d

A rarr t rarr f rarr s rarr aA rarr t rarr m rarr s rarr bA rarr t rarr m rarr s rarr cA rarr d

Figure 3 Examples of path-quads

Table 1 The conceptual terms used for two three and four individuals

Two individuals Three individuals Four individualsCommon ancestor Triple-common ancestor Quad-common ancestorPath-pair Path-triple Path-quad119861119894 119862(119875

119860119886 119875119860119887) 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) 119876119906119886119889 119862(119875

119860119886 119875119860119887 119875119860119888 119875119860119889)

NA 2-Overlap individual 3-Overlap individualNA 2-Overlap path 3-Overlap pathNA Root 2-overlap path Root 3-overlap pathNA Crossover individual Crossover individual

the two paths share no common individuals except 119860 InFigure 2 path-pair2 is an acceptable path-pair while path-pair1 path-pair3 and path-pair4 are not acceptable path-pairs The contribution of each common ancestor 119860 toΦ

119886119887is

computed based on the inbreeding coefficient of 119860 modifiedby the length of each acceptable path-pair

To compute Φ119886119887119888

the path-counting approach requiresidentifying all triple-common ancestors of 119886 119887 and 119888 andsumming up all triple-common ancestorsrsquo contributions toΦ119886119887119888

For each triple-common ancestor denoted as119860 we firstidentify all path-triples each of which consists of three pathsfrom 119860 to 119886 119887 and 119888 respectively Some examples of path-triples are presented in Figure 2

For Φ119886119887 only nonoverlapping path-pairs are acceptable

A path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three path-pairs

⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩ For Φ

119886119887119888 a path-triple

might be acceptable even though either 2-overlap individualsor crossover individuals exist between a path-pair Themain challenge we need to address is finding necessary andsufficient conditions for acceptable path-triples

Aiming at solving the problem of identifying acceptablepath-triples we first use a systematic method to generate allpossible cases for a path-pair by considering different types ofcommon individuals shared between the two pathsThen weintroduce building blocks which are connected graphs withconditions on every edge in the graph that encapsulates a

set of acceptable cases of path-pairs In each building blockwe represent paths as nodes and interactions (ie sharedcommon individuals between two paths) as edges There areat least two paths in a building block For each buildingblock we obtain all acceptable cases for concerned path-pairs Given a path-triple it can be decomposed to one ormultiple building blocks Considering a shared path-pairbetween two building blocks we use the natural join operatorfrom relational algebra to match the acceptable cases forthe shared path-pair between two building blocks In otherwords considering the acceptable cases for building blocksas inputs we use the natural join operator to construct allacceptable cases for a path-triple Acceptable cases for a path-triple are identified and then used in deriving the path-counting formula forΦ

119886119887119888

Then we summarize all the main procedures used forderiving the path-counting formula for Φ

119886119887119888in a flowchart

shown in Figure 4 The main procedures are also applicablefor deriving the path-counting formulas forΦ

119886119887119888119889andΦ

119886119887119888119889

3 Results and Discussion

31 Path-Counting Formulas for Three Individuals We firstintroduce a systematic method to generate all possible cases

6 Computational and Mathematical Methods in Medicine

Path-pair

Path-triple Path-pair levelrepresentation Decomposition A set of

building blocksSets of acceptable casesFor each building block

Acceptable cases forpath-triple Natural join

If path-pair hascrossover

No

No

Yes

Yes

Split operator

Path-triple belongs toType 2

Type 1

If path-pair hasroot overlap

Compute its contributionto Φabc

Path-triple belongs to

⟨PAa PAb⟩Generate all cases for Identify nonoverlap path-

Pairs for ⟨PAa PAb⟩Compute its contribution

to Φab

Identify acceptable cases⟨PAa PAb⟩ in thefor

context of a path-triple

Aa PAb PAc ⟩⟨P

⟨PAa PAb⟩

Figure 4 A flowchart for path-counting formula derivation

for a path-pair Then we discuss building blocks for path-triples and identify all acceptable cases which are used inderiving the path-counting formula forΦ

119886119887119888

311 Cases for a Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩with

119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 where 119860 is a common ancestor of 119886

and 119887 and 119861119894 119862(119875119860119886 119875119860119887) consists of all common individuals

shared between 119875119860119886

and 119875119860119887 except 119860 we introduce three

patterns (ie crossover 2-overlap and root 2-overlap) to gen-erate all possible cases for ⟨119875

119860119886 119875119860119887⟩

(1) 119883(119875119860119886 119875119860119887) 119875119860119886

and 119875119860119887

share one or multiple cross-over individuals

(2) 119879(119875119860119886 119875119860119887) 119875119860119886

and 119875119860119887

are root 2-overlapping from119860 and the root 2-overlap path can have one or multi-ple 2-overlap individuals

(3) 119884(119875119860119886 119875119860119887)119875119860119886

and119875119860119887

are overlapping but not from119860 and the 2-overlap path can have one or multiple 2-overlap individuals

Based on the three patterns 119883(119875119860119886 119875119860119887) 119879(119875

119860119886 119875119860119887)

and 119884(119875119860119886 119875119860119887) we use regular expressions to generate all

possible cases for the path-pair ⟨119875119860119886 119875119860119887⟩ For convenience

we drop ⟨119875119860119886 119875119860119887⟩ and use 119883119879 and 119884 instead of patterns

119883(119875119860119886 119875119860119887) 119879(119875

119860119886 119875119860119887) and 119884(119875

119860119886 119875119860119887) whenever there is

no confusion When 119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 the eight cases

shown in (7) cover all possible cases for ⟨119875119860119886 119875119860119887⟩ The com-

pleteness of eight cases shown in (7) for ⟨119875119860119886 119875119860119887⟩ can be

proved by induction on the total number of 119879 119883 and 119884appearing in ⟨119875

119860119886 119875119860119887⟩ Using the pedigree in Figure 2 Cases

1ndash3 and Case 6 are illustrated in (8) (9) (10) and (11)

Case 1 119879Case 2 119883+

Case 3 119879119883+

Case 4 119879(119883+119884)+

Case 5 119879(119883+119884)+119883+

Case 6 119883+119884Case 7 119883+(119884119883+)+

Case 8 119883+(119884119883+)+119884

(7)

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119879 (8)

Computational and Mathematical Methods in Medicine 7

S0 S1 S2 S3

PAa PAb

PAc

Figure 5 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888⟩

where 119904 119890 119905 are 2-overlap individuals and the overlap pathis a root 2-overlap path

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119879119883 (9)

where 119904 is a 2-overlap individual and the overlap path is a root2-overlap path 119905 is a crossover individual

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119889 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119883 (10)

where 119905 is a crossover individual

119860 997888rarr 119888 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119883119884 (11)

where 119890 is a crossover individual 119905 is a 2-overlap individualand the overlap path is a 2-overlap path

312 Path-Pair Level Graphical Representation of a Path-Tri-ple Given a path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ we represent each

path as a node The path-triple can be decomposed to threepath-pairs (ie ⟨119875

119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩) For

each path-pair if the two paths share at least one commonindividual (ie either 2-overlap individual or crossover indi-vidual) except119860 then there is an edge between the two nodesrepresenting the two paths Therefore we obtain four differ-ent scenarios 119878

0ndash1198783 shown in Figure 5

In Figure 5 the scenario 1198780has no edges so it means

that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent paths In

Figure 2 path-triple1 is an example of 1198780 Next we introduce

a lemma which can assist with identifying the options for theedges in the scenarios 119878

1ndash1198783

Lemma 3 Given a path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consider the

three path-pairs ⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩ if there

is a 2-overlap edge which is represented by 119884 in regular expres-sion representation of any of the three path-pairs and then thepath-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ has no contribution to Φ

119886119887119888

Proof In [17] Nadot and Vaysseix proposed from a geneticand biological point of view that Φ

119886119887119888can be evaluated by

enumerating all eligible inheritance paths at allele-level start-ing from a triple common ancestor119860 to the three individuals119886 119887 and 119888

p1

p3

A

b c

a

p2

p5

p8

p4

p7

p6

(a) Pedigree

A

b c

a

p5

p7

p4

p6

p8

p1 p2

p3

(b) Inheritance paths

Figure 6 Examples of pedigree and inheritance paths

For the pedigree in Figure 6 let us consider the path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ listed as follows 119875

119860119886 119860 rarr 119886 119875

119860119887

119860 rarr 1199013rarr 1199016rarr 1199017rarr 119887 119875

119860119888 119860 rarr 119901

4rarr 1199016rarr

1199017rarr 119888For ⟨119875

119860119887 119875119860119888⟩ 1199016is a crossover individual 119901

7is an over-

lap individual and 1199016rarr 1199017is a 2-overlap edge repre-sented

by 119884 in regular expression representation (see the definitionfor 119884 in Section 311)

For the individual 1199016 let us denote the two alleles at one

fixed autosomal locus as 1198921and 119892

2 At allele-level only one

allele can be passed down from 1199016to 1199017 Since 119901

3and 119901

4

are parents of 1199016 1198921is passed down from one parent and

1198922is passed down from the other parent It is infeasible to

pass down both 1198921and 119892

2from 119901

6to 1199017 In other words

there are no corresponding inheritance paths for the path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩with a 2-overlap edge between ⟨119875

119860119887 119875119860119888⟩

(ie Case 6119883119884) Therefore such kind of path-triples has nocontribution toΦ

119886119887119888

Figure 6(b) shows one example of eligible inheritancepaths corresponding to a pedigree graph Each individual isrepresented by two allele nodesThe eligible inheritance pathsin Figure 6(b) consist of red edges only

Only Case 1 Case 2 and Case 3 do not have 119884 in theregular expression representation of a path-pair (see (7))considering the scenarios 119878

1ndash1198783shown in Figure 5 an edge

can have three options Case 1 119879Case 2 119883Case 3 119879119883

313 Constructing Cases for a Path-Triple For the scenarios1198781ndash1198783in Figure 5 we define two building blocks 119861

1 1198612

along with some rules in Figure 7 to generate acceptablecases For 119861

1 the edge can have three options Case 1 119879

Case 2 119883 Case 3 119879119883 For 1198612 we cannot allow both edges

to be root overlap because if two edges are root overlap then

8 Computational and Mathematical Methods in Medicine

For B2 there can be at most one edge belonging to root overlap (either T or TX)

PAa PAa

PAb PAb PAc

B1 B2

For B1 the edge can have three options case 1 T case 2 X case 3 TX

Figure 7 Building blocks 1198611 1198612 and basic rules

Note Ri denotes all acceptable path-triples for ui

S3e1

T3 = R1 ⋈ R2 ⋈ R3u1 u2 u3

e2 e2 e2

e3e3 e3e1 e1

Figure 8 A graphical illustration for obtaining 1198793

119875119860119886

and 119875119860119888

must share at least one com-mon individualexcept 119860 which contradicts the fact that 119875

119860119886and 119875

119860119888have

no edgeNext we focus on generating all acceptable cases for the

scenarios 1198781ndash1198783in Figure 5 where only 119878

3contains more

than one building block In order to leverage the dependencyamong building blocks we decompose 119878

3to 1198783= 1199061= 1198612

1199062= 1198612 1199063= 1198612 shown in Figure 8 For each 119906

119894 we have a

set of acceptable path-triples denoted as 119877119894

Considering the dependency among 1198771 1198772 1198773 we use

the natural join operator denoted as ⋈ operating on 1198771

1198772 1198773 to generate all acceptable cases for 119878

3 As a result we

obtain 1198793= 1198771⋈ 1198772⋈ 1198773 where 119879

3denotes the acceptable

cases of the path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in the scenario 119878

3

For each scenario in Figure 5 we generate all acceptablecases for ⟨119875

119860119886 119875119860119887 119875119860119888⟩ The scenario 119878

0has no edges and

it shows that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent

paths while for the other scenarios 119878119896(119896 = 1 2 3) the 119896

edges can have two options

(1) all 119896 edges belong to crossover or(2) one edge belongs to root 2-overlap the remaining (119896minus

1) edges belong to crossover

In summary acceptable path-triples can have at most oneroot 2-overlap path any number of crossover individuals butzero 2-overlap path

314 Splitting Operator Considering the existence of root2-overlap path and crossover in acceptable path-triples wepropose a splitting operator to transform a path-triple withcrossover individuals to a noncrossover path-triple withoutchanging the contribution from this path-triple to Φ

119886119887119888 The

main purpose of using the splitting operator is to simplifythe path-counting formula derivation process We first usean example in Figure 9 to illustrate how the splitting operator

works In Figure 9 there is a crossover individual 119904 between119875119860119886

and 119875119860119887

in the path triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in 119866

119896+1 The

splitting operator proceeds as follows

(1) split the node 119904 to two nodes 1199041and 1199042

(2) transform the edges 119904 rarr 1198861015840 and 119904 rarr 119887

1015840 to 1199041rarr 1198861015840

and 1199042rarr 1198871015840 respectively

(3) add two new edges 1199042rarr 1198861015840 and 119904

1rarr 1198871015840

Lemma 4 Given a pedigree graph 119866119896+1

having (119896 + 1)

crossover individuals regarding ⟨119875119860119886 119875119860119887 119875119860119888⟩ shown in

Figure 9 let 119904 denote the lowest crossover individual where nodescendant of 119904 can be a crossover individual among the threepaths119875

119860119886119875119860119887 and119875

119860119888 After using the splitting operator for the

lowest crossover individual 119904 in119866119896+1 the number of crossover

individuals in 119866119896+1

is decreased by 1

Proof The splitting operator only affects the edges from 119904 to1198861015840 and 1198871015840 If there is a new crossover node appearing the only

possible node is either 1198861015840 or 1198871015840 Assume 1198871015840 becomes a cross-over individual it means that 1198871015840 is able to reach 119886 and 119887 fromtwo separate paths It contradicts the fact that 119904 is the lowestcrossover individual between 119875

119860119886and 119875

119860119887

Next we introduce a canonical graph which results fromapplying the splitting operator for all crossover individualsThe canonical graph has zero crossover individual

Definition 5 (Canonical Graph) Given a pedigree graph 119866having one or more crossover individuals regarding Φ

119886119887119888 If

there exists a graph 1198661015840 which has no crossover individualswith regards to Φ

119886119887119888such that

(i) any acceptable path-triple in 119866 has an acceptablepath-triple in 1198661015840 which has the same contribution toΦ119886119887119888

as the one in 119866 forΦ119886119887119888

(ii) any acceptable path-triple in 1198661015840 has an acceptablepath-triple in 119866 which and has the same contributionto Φ119886119887119888

as the one in 1198661015840 forΦ119886119887119888

We call 1198661015840 a canonical graph of 119866 regardingΦ119886119887119888

Lemma 6 For a pedigree graph 119866 having one or morecrossover individuals regarding ⟨119875

119860119886 119875119860119887 119875119860119888⟩ there exists a

canonical graph 1198661015840 for 119866

Computational and Mathematical Methods in Medicine 9

Ancestor-descendant relationshipParent-child relationship

a998400 b

a b a b

998400 a998400 b998400

s1 s2

A A

x w c x w c

s For Gk+1 ⟨P ⟩ = PAa PAb PAc

⟨P ⟩ = PAa PAb PAcFor Gk

Gk+1 k + 1 crossover Gk k crossover

A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b

A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b

A rarr c

A rarr c

Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866

119896having 119896 crossover

S0

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

PAa PAd

PAb PAc

Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Proof (Sketch) The proof is by induction on the number ofcrossover individuals

Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866

In the induction step let119866119896+1

be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875

119860119886and

119875119860119887

in 119866119896+1

We apply the splitting operator on 119904 in 119866119896+1

andobtain 119866

119896having 119896 crossovers by Lemma 4

315 Path-Counting Formula for Φ119886119887119888

Now we present thepath-counting formula forΦ

119886119887119888

Φ119886119887119888= sum

119860

( sum

Type 1(1

2)

119871 triple

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple+1

Φ119860119860)

(12)

where Φ119860119860= (12)(1 + 119865

119860) Φ119860119860119860

= (14)(1 + 3119865119860) 119865119860 the

inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875

119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type

2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875

119860119904ending at

the individual 119904

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 2(13)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119886

119875119860119888 and 119875

119860119904)

For completeness the path-counting formula for Φ119886119886119887

isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B

32 Path-Counting Formulas for Four Individuals

321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and

119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11

scenarios 1198780ndash11987810shown in Figure 10 where all four paths are

considered symmetricallyIn Figure 11 we introduce three building blocks 119861

1

1198612 1198613 For 119861

1and 119861

2 the rules presented in Figure 7 are also

applicable for Figure 11 For1198613 we only consider root overlap

because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the

scenario 1198783in Figure 8 Therefore we only need to consider

1198613when 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0

322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ For a scenario 119878

119894(0 le 119894 le 10) in Figure 11 we

first decompose 119878119894to one or multiple building blocks For a

scenario 119878119894isin 1198781 1198783 it has only one building block and

all acceptable cases can be obtained directly For 1198782= 1199061=

1198611 1199062= 1198611 there is no need to consider the conflict between

the edges in 1199061and 119906

2because 119906

1and 119906

2are disconnected

Let 119877119894denote all acceptable cases of the path-pairs in 119906

119894 and

let 119879119894denote all acceptable cases for 119878

119894 Therefore we obtain

1198792= 1198771times1198772where times denotes the Cartesian product operator

from relational algebra

10 Computational and Mathematical Methods in Medicine

For B3 all three edges belong to root overlap (ie having root 3-overlap)

PAa

PAb PAcPAb

PAa

C(PAa PAb PAc) ne

B1 B2 B3

Tri 0

Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)

119878119894

1198784

1198785

1198787

1198788

1198789

11987810

119878119895

1198783

1198783

1198786

1198785

1198787

1198789

For 1198786= 1199061= 1198613 we obtain 119879

6= 1198771 For 119878

119894isin 119878119894| 4 le

119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based

on which we construct 119879119894

Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le

10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878

119895 is

defined as follows

(1) 119878119895is a proper subgraph of 119878

119894

(2) if 119878119894contains 119861

3 then 119878

119895must also contain 119861

3

(3) no such 119878119896exists that 119878

119895is a proper subgraph of 119878

119896

while 119878119896is also a proper subgraph of 119878

119894

For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the

largest subgraph of 119878119894 denoted as 119878

119895 in Table 2

For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878

119894 119878119895)

denote the set of building blocks in 119878119894but not in 119878

119895 where 119878

119895is

the largest subgraph of 119878119894 Let |119864

119894| and |119864

119895| denote the number

of edges in 119878119894and 119878

119895 respectively According to Table 2 we

can conclude that |119864119894| minus |119864

119895| = 1 In order to leverage the

dependency among building blocks we consider only 1198612in

Diff(119878119894119878119895) For example Diff(119878

51198783) = 119861

2 Let119879

3denote all

acceptable cases for 1198783 And let119877

1denote the set of acceptable

cases for Diff(1198785 1198783) Then we can use 119878

3and Diff(119878

5

1198783) to construct all acceptable cases for 119878

5 Then we apply

this idea for constructing all acceptable cases for each 119878119894in

Table 2Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case

has the following properties

(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path

(2) otherwise there can be at most two root 2-overlappaths

323 Path-Counting Formula forΦ119886119887119888119889

Now we present thepath-counting formula forΦ

119886119887119888119889as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

119871quad

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad+1

Φ119860119860119860

+ sum

Type 3(1

2)

119871quad+2

Φ119860119860)

(14)

where Φ119860119860= (12)(1+119865

119860)Φ119860119860119860

= (14)(1+3119865119860)Φ119860119860119860119860

=

(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-

common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904

ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path119875119860119905

ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap

path 119875119860119905

ending at 119904 and 119905respectively

119871quad =

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119904

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860119905for Case 2 isin Type 3

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus119871119875119860119905minus 119871119875119860119904

for Case 3 isin Type 3(15)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119887

119875119860119888 119875119860119889 etc)

For completeness the path-counting formulas for Φ119886119886119887119888

and Φ119886119886119886119887

are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C

Computational and Mathematical Methods in Medicine 11

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

s t

da

A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d

(a)

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

x y

da

A rarr x rarr a

A rarr x rarr d

A rarr y rarr bA rarr y rarr c

(b)

Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889

33 Path-Counting Formulas for Two Pairs of Individuals

331 Terminology and Definitions

(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875

119878119886isin 119875(119878 119886) 119875

119878119887isin

119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875

119879119889isin 119875(119879 119889) 119878 is a common ancestor

of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889

(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875

119860119886 119875119860119887) (or

119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when

119875119860119886

and 119875119860119887

(or 119875119860119888

and 119875119860119889) pass through the same parent of

119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call

119903 a heter-overlap individual when 119875119860119894

and 119875119860119895

pass throughthe same parent of 119903

(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap

individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path

Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875

119860119886

and 119875119860119887

119905 is a homo-overlap individual between 119875119860119888

and 119875119860119889 And

119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875

119860119886and 119875

119860119889 119910 is a heter-

overlap individual between 119875119860119887

and 119875119860119888 And 119860 rarr 119909 and

119860 rarr 119910 are root heter-overlap paths

332 Path-Counting Formula for Φ119886119887119888119889

Now we presenta path-pair level graphical representation for ⟨(119875

119860119886 119875119860119887)

(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can

be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩

presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are

summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875

119860ℎ(ie the path 119875

119860ℎending at ℎ) and 119903

1

and 1199032are the last individuals of root heter-overlap paths 119875

1198601199031

and 1198751198601199032

respectivelyGiven a pedigree graph having one or multiple progeni-

tors 119901119894| 119894 gt 0 we define that the generation of a progenitor

Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

Zero root 2-overlap andzero root 3-overlap

Zero root homo-overlap and zero rootheter-overlap

One root 2-overlap path

One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap

Two root 2-overlap paths

Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps

One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

One root 2-overlap andone root 3-overlap

One root homo-overlap and two rootheter-overlaps and 119903

1= 1199032= ℎ

One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

119901119894is 0 denoted as gen(119901

119894) = 0 If an individual 119886 has only

one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1

The path-counting formula forΦ119886119887119888119889

is as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

1198712-pair

Φ119860119860119860

+ sum

Type 2(1

2)

1198712-pair+1

Φ119860119860119860

+ sum

Type 3(1

2)

1198712-pair+2

Φ119860119860

+ sum

Type 4(1

2)

1198712-pair+1

Φ119860119860)

+ sum

(119878119879)isinType 5(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

⟩+1

Φ119861119861

(16)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =

119860) there are four types (ieType 1 to Type 4)

12 Computational and Mathematical Methods in Medicine

S0S1 S2 S3 S4 S5 S6 S7

S8 S9 S10 S11 S12 S13 S14 S15 S16

PAa

PAdPAb

PAc

Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level

Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875

119860119903ending at 119903

Type 3

zero root homo-overlap and two rootheter-overlap 119875

1198601199031and1198751198601199032

ending at1199031and 1199032 respectively

one root homo-overlap 119875119860ℎ

ending at ℎand two root heter-overlap 119875

1198601199031and 119875

1198601199032

ending at 1199031and 1199032 and 119903

1= 1199032

(17)

Type 4 one root homo-overlap 119875119860ℎ

ending at ℎ andtwo root heter-overlap ending at 119903

1and 1199032 and ℎ =

1199031= 1199032 For ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is

one type (ie Type 5)Type 5 ⟨119875

119878119886 119875119878119887⟩ has zero overlap individual ⟨119875

119879119888

119875119879119889⟩ has zero overlap individual

At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875

119879119888

119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875

119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩

there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879

119861=

119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)

and 119879 has two parents119879 otherwise

1198712-pair =

119871119875119860119886+ 119871119875119860119887

+119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119903

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 1198711198751198601199031

minus 1198711198751198601199032

for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860ℎfor Type 4

119871⟨119875119878119886 119875119878119887⟩

= 119871119875119878119886+ 119871119875119878119887

for Type 5

119871⟨119875119879119888 119875119879119889⟩

= 119871119875119879119888+ 119871119875119879119889

for Type 5

(18)

Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ

119886119887119888119889

Φ119886119887119888119889

= sum

(119878119879)isinType 6(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

Φ119878119878lowast Φ119879119879 (19)

Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875

119879119888

119875119879119889⟩ is a nonoverlapping path-pair Between a path from

⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩ there are no overlap

individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩

and 119871⟨119875119879119888119875119879119889⟩

are defined as in Type 5The correctness of the path-counting formula forΦ

119886119887119888119889is

proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ

119886119886119887119888 Φ119886119887119886119888

Φ119886119887119886119887

andΦ119886119886119886119887

34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method

Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows

(1) If 119906 is 119903 then NC(119903) contains only one element theempty string

(2) Otherwise let 119906 be a node with NC(119906) and V0 V1

V119896be 119906rsquos children in sibling order then for each 119909

in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le

119894 le 119896 and lowast indicates the gender of the individualrepresented by node V

119894

Computational and Mathematical Methods in Medicine 13

Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33

In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group

We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement

In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients

In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree

Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130

4 Conclusion

We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients

0

50

100

150

200

77 181

383

769

1558

3105

6174

1235

1

2466

7

4976

1

9832

8

1951

97

250

300

Aver

age t

ime (

ms)

Individuals in pedigree

RecursiveNodecodes

Figure 14 The effect of pedigree size on computation efficiencyimprovement

0200400600800

10001200140016001800

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Aver

age t

ime (

ms)

Depth

RecursiveNodeCodes

Figure 15 The effect of depth on computation efficiency improve-ment

we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees

Appendices

A Path-Counting Formulas of Special Cases

A1 Path-Counting Formula for Φ119886119886119887

For ⟨1198751198601198861 1198751198601198862⟩ we

introduce a special case where 1198751198601198861

and 1198751198601198862

aremergeable

14 Computational and Mathematical Methods in Medicine

PAa1 PAa2 PAa1 PAa2

S0 S1

PAb PAb PAb

If is mergeable⟨P ⟩Aa1 PAa2

PAa

S2 S3

Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩

Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861

1198751198601198862⟩ is mergeable if and only if the two paths 119875

1198601198861and 119875

1198601198862

are completely identical

Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩ in Figure 16

Lemma A2 For 1198782and 119878

3in Figure 16 ⟨119875

1198601198861 1198751198601198862⟩ cannot

be a mergeable path-pair

Proof For 1198782and 119878

3 if ⟨119875

1198601198861 1198751198601198862⟩ is mergeable then

any common individual 119904 between 1198751198601198861

and 119875119860119887

is alsoa shared individual between 119875

1198601198862and 119875

119860119887 It means

119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that

119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0

Considering all three scenarios in Figure 16 only 1198781can

have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now

we present our path-counting formula forΦ119886119886119887

where 119886 is notan ancestor of 119887

Φ119886119886119887

= sum

119860

( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860

+ sum

Type 3(1

2)

119871⟨119875119860119886119875119860119887⟩+1

Φ119860119860)

(A1)

where 119860 a common ancestor of 119886 and 119887When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap

Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at the individual 119904

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair

119871 triple = 1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

for Type 3

(A2)

For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ

119886119886119887in [10] but we can use either

the recursive formula for Φ119886119887119888

or the path-counting formulaforΦ119886119887119888

to computeΦ11988611198862119887

A2 Path-Counting Formula for Φ119886119886119887119888

Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable then

we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888

119875119860119889⟩ If ⟨119875

1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875

1198601198861 1198751198601198862

119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875

119860119886 119875119860119887 119875119860119888⟩

Now we present a path-counting formula forΦ119886119886119887119888

where119886 is not an ancestor of 119887 and 119888 as follows

Φ119886119886119887119888

= sum

119860

( sum

Type 1(1

2)

119871quadminus1

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad

ΦAAA

+ sum

Type 3(1

2)

119871quad+1

Φ119860119860)

+sum

119860

( sum

Type 4(1

2)

119871 triple+1

Φ119860119860119860

+ sum

Type 5(1

2)

119871 triple+2

Φ119860119860)

(A3)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875

119860119904ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

and 1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path 119875119860119905

ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904

and 119875119860119905

ending at 119904 and 119905respectively

(A4)

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path

Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875

119860119904

ending at 119904

119871quad=

1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119904

for Type 21198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905

for Case 2isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905minus 119871119875119860119904

for Case 3isinType 3

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 5(A5)

Computational and Mathematical Methods in Medicine 15

Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ

119886119887119888119889is applicable

to computeΦ11988611198862119887119888

A3 Path-Counting Formula for Φ119886119886119886119887

A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced

when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of

a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-

densed to ⟨119875119860119886 119875119860119887⟩

Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861

1198751198601198862

and 1198751198601198863

they are mergeable if and only if theyare completely identical

Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there

must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩

⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩

Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one

mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and

⟨1198751198601198862 1198751198601198863⟩

For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable

path-pairNow we present the path-counting formula for Φ

119886119886119886119887

where 119886 is not an ancestor of 119887 as follows

Φ119886119886119886119887

= sum

119860

(3

2( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860)

+ sum

Type 3(1

2)

119871pair+2

Φ119860119860)

(A6)

where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-

sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)

Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path

Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at 119904

When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping

119871 triple = 1198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3

(A7)

Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887

=

Φ119886111988621198863119887

Then we apply the path-counting formula forΦ119886119887119888119889

to computeΦ119886111988621198863119887

Case21 Case31 ΦAAAΦabCase22 Case32

Case23 ΦAA

Figure 17 Dependency graph for different cases regardingΦ119886119887119888

andΦ119886119886119887

B Proof for Path-Counting Formulas ofThree Individuals

Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ

119886119887119888is equivalent to the

computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors

B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ

119886119887119888andΦ

119886119886119887119866 can

have 5 different cases

Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 23 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

having mergeablepath-pair⟨119875

1198601198861 1198751198601198862⟩

lArr997904 Φ119886119886119887

Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlapCase 32 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlap

lArr997904 Φ119886119887119888

(B1)

Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals

Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)

(i) forΦ119886119887 the correctness of the path-counting formula

(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32

(ii) for Case 23 it has no cycle but only depends on Φ119886119887

Thus we prove the correctness of Case 23 by trans-forming the case toΦ

119886119887

16 Computational and Mathematical Methods in Medicine

a b

c

(a)

A

a b c

(b)

Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another

Parent-child relationshipAncestor-descendant relationship

A

a

s v p

f b c

(a)

Parent-child relationshipAncestor-descendant relationship

c

a

s v

f b

(b)

Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887

(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866

B11 Correctness Proof for Case 31

Case 31 ForΦ119886119887119888

119866 does not have any path triples ⟨119875119860119886 119875119860119887

119875119860119888⟩ with root overlap

Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888

Using the recursive formula (3) to compute Φ119886119887119888

forFigure 18(a) Φ

119886119887119888= (12)Φ

119888119887119888= (12)

2

Φ119888119888119888 for Figure 18(b)

Φ119886119887119888= (12)Φ

119860119887119888= (12)

2

Φ119860119860119888

= (12)3

Φ119860119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the

contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

follows sumType 1(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

where 119871⟨119875119860119886119875119860119887 119875119860119888⟩

=

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

For Figure 18(a) 119888 is the only triple-common ancestor

and we obtain Φ119886119887119888

= (12)119871⟨119875119888119886119875119888119887

119875119888119888⟩Φ119888119888119888

= (12)2

Φ119888119888119888 for

Figure 18(b) we obtain Φ119886119887119888

= (12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

=

(12)3

Φ119860119860119860

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1

For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866

then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888

119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ

119891119887119888= sumType 1(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860

Based on the recursive formula (3)Φ

119886119887119888= (12)(Φ

119891119887119888+Φ119898119887119888)

where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ

119898119887119888= 0 Then we can plug-in the path-

counting formula forΦ119891119887119888

to obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2lowast sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩

Φ119860119860119860

= sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860119860

∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 1(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩

Φ119860119860119860

(B2)

Similarly for Figure 19(b) we obtain Φ119886119887119888

=

sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1

Φ119888119888119888= sumType 1(12)

119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888

Thus it is true for 119899 = 119896 + 1

B12 Correctness Proof for Case 32

Case 32 ForΦ119886119887119888

119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root

overlap

Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888

Computational and Mathematical Methods in Medicine 17

a

b

c

(a)

A

a

b c

(b)

A

a

s

b

c

(c)

Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another

Using the recursive formula (3) to compute Φ119886119887119888

inFigure 20 for Figure 20(a) Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)3

Φ119888119888 for Figure 20(b)Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)4

Φ119860119860

for Figure 20(c)Φ119886119887119888= (12)

2

Φ119904119904119888= (12)

3

Φ119904119888=

(12)5

Φ119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-

tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

followssumType 2(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩+1

Φ119860119860

where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

=

119871119875119860119886

+ 119871119875119860119887

+ 119871119875119860119888minus 119871119875119860119904

and 119904 is the last individual of theroot overlap path 119875

119860119904

For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ

119886119887119888= (12)

119871⟨119875119888119886119875119888119887119875119888119888⟩+1

Φ119888119888= (12)

2+1

Φ119888119888=

(12)3

Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain

Φ119886119887119888= (12)

4

Φ119860119860

and Φ119886119887119888= (12)

5

Φ119860119860

respectively

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1

For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ

119891119887119888in 119866lowast Φ

119891119887119888= sumType 2(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

In 119866 119891 is the only parent of 119886 according to the recursive

formula (3) we have Φ119886119887119888= (12)Φ

119891119887119888 Then we can plug-in

the Φ119891119887119888

and obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩+1

Φ119860119860

(B3)

For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ

119886119887119888for Figure 21(a)

In summary it is true for 119899 = 119896 + 1

A

a

s

t

f

b

c

(a)

a

t

b

A

s c

(b)

a

s

t

b

c

(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887

B13 Correctness Proof for Case 23

Case 23 For Φ119886119886119887

the path-triples in the pedigree graph 119866have mergeable path-pair

Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875

1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means

that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ

119886119886119887can be computed as follows

sumType 3(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

where 119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

Using the recursive formula (4) we obtain Φ

119886119886119887=

(12)(Φ119886119887+ Φ119891119898119887)

For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891

there4 Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

=1

2(Φ119886119887+ 0) =

1

2Φ119886119887

(as 119898 is missing) (B4)

For Φ119886119887 we use Wrightrsquos formula and obtain Φ

119886119887=

sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860

where 119875 denotes all nonoverlappingpath-pairs ⟨119875

119860119886 119875119860119887⟩

Then we have Φ119886119886119887

= (12)Φ119886119887

=

(12)sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

For Figure 22(b) we can also transform the computation

of Φ119886119886119887

to Φ119886119887

In summary it shows that the path-counting formula(A1) is true for Case 23

B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887

whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ

119886119886119887can be

transformed toΦ11988611198862119887

which is equivalent to the computationof Φ119886119887119888

for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ

119886119886119887when the path-triple belongs

to either Case 21 or Case 22

B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 3: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

Computational and Mathematical Methods in Medicine 3

Φ119886119887119888119889

=1

2(Φ119891119887119888119889

+ Φ119898119887119888119889

)

if 119886 is not an ancestor of 119887 or 119888 or 119889

Φ119886119886119887119888

=1

2(Φ119887119888+ Φ119891119898119887119888

)

if 119886 is not an ancestor of 119887 or 119888

Φ119886119887119886119888

=1

4(2Φ119886119887119888+ Φ119891119887119898119888

+ Φ119898119887119891119888

)

if 119886 is not an ancestor of 119887 or 119888

Φ119886119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

if 119886 is not an ancestor of 119887

Φ119886119886119886119886

=1

4(1 + 3Φ

119891119898) =

1

4(1 + 3119865

119886)

(5)

Φ119886119887119888

is the probability that randomly chosen alleles atthe same locus from each of the three individuals (ie 119886 119887and 119888) are identical by descent (IBD) Similarly Φ

119886119887119888119889is the

probability that randomly chosen alleles at the same locusfrom each of the four individuals (ie 119886 119887 119888 and 119889) are IBDΦ119886119887119888119889

is the probability that a random allele from 119886 is IBDwith a random allele from 119887 and that a random allele from 119888

is IBD with a random allele from 119889 at the same locus Notethat Φ

119886119887119888= 0 if there is no common ancestor of 119886 119887 and 119888

Φ119886119887119888119889

= 0 if there is no common ancestor of 119886 119887 119888 and 119889 andΦ119886119887119888119889

= 0 in the absence of a common ancestor either for 119886and 119887 or for 119888 and 119889

22 Identity Coefficients and Condensed Identity CoefficientsGiven two individuals 119886 and 119887withmaternally and paternallyderived alleles at a fixed autosomal locus there are 15 possibleidentity states and the probabilities associated with eachidentity state are called identity coefficients Ignoring thedistinction betweenmaternally and paternally derived alleleswe categorize the 15 possible states to 9 condensed identitystates as shown in Figure 1 The states range from state 1in which all four alleles are IBD to state 9 in which noneof the four alleles are IBD The probabilities associated witheach condensed identity state are called condensed identitycoefficients denoted by Δ

119894| 1 le 119894 le 9 The condensed

identity coefficients can be computed based on generalizedkinship coefficients using the linear transformation shown asfollows in (6)

[[[[[[[[[[[[

[

1 1 1 1 1 1 1 1 1

2 2 2 2 1 1 1 1 1

2 2 1 1 2 2 1 1 1

4 0 2 0 2 0 2 1 0

8 0 4 0 2 0 2 1 0

8 0 2 0 4 0 2 1 0

16 0 4 0 4 0 2 1 0

4 4 2 2 2 2 1 1 1

16 0 4 0 4 0 4 1 0

]]]]]]]]]]]]

]

[[[[[[[[[[[[

[

Δ1

Δ2

Δ3

Δ4

Δ5

Δ6

Δ7

Δ8

Δ9

]]]]]]]]]]]]

]

=

[[[[[[[[[[[[

[

1

2Φ119886119886

2Φ119887119887

4Φ119886119887

8Φ119886119886119887

8Φ119886119887119887

16Φ119886119886119887119887

4Φ119886119886119887119887

16Φ119886119887119886119887

]]]]]]]]]]]]

]

(6)

In our work we focus on deriving the path-counting for-mulas for the generalized kinship coefficients includingΦ

119886119887119888

Φ119886119887119888119889

and Φ119886119887119888119889

23 Terms Defined for Path-Counting Formulas for Three andFour Individuals

(1) Triple-Common AncestorGiven three individuals 119886 119887 and119888 if119860 is a common ancestor of the three individuals then wecall 119860 a triple-common ancestor of 119886 119887 and 119888

(2) Quad-Common Ancestor Given four individuals 119886 119887 119888and 119889 if119860 is a common ancestor of the four individuals thenwe call 119860 a quad-common ancestor of 119886 119887 119888 and 119889

(3) 119875(119860 119886) It denotes the set of all possible paths from 119860 to119886 where the paths can only traverse edges in the direction ofparent to child such that 119875(119860 119886) = 119873119880119871119871 if and only if 119860 isan ancestor of 119886 119875

119860119886denotes a particular path from 119860 to 119886

where 119875119860119886isin 119875(119860 119886)

(4) Path-Pair It consists of two paths denoted as ⟨119875119860119886 119875119860119887⟩

where 119875119860119886isin 119875(119860 119886) and 119875

119860119887isin 119875(119860 119887)

(5) Nonoverlapping Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩

it is nonoverlapping if and only if the two paths share nocommon individuals except 119860

(6) Path-Triple It consists of three paths denoted as ⟨119875119860119886 119875119860119887

119875119860119888⟩ where 119875

119860119886isin 119875(119860 119886) 119875

119860119887isin 119875(119860 119887) and 119875

119860119888isin 119875(119860 119888)

(7) Path-Quad It consists of four paths denoted as ⟨119875119860119886 119875119860119887

119875119860119888 119875119860119889⟩ where 119875

119860119886isin 119875(119860 119886) 119875

119860119887isin 119875(119860 119887) 119875

119860119888isin 119875(119860 119888)

and 119875119860119889isin 119875(119860 119889)

(8) 119861119894 119862(119875119860119886 119875119860119887) It denotes all common individuals shared

between 119875119860119886

and 119875119860119887 except 119860

(9) 119879119903119894 119862(119875119860119886 119875119860119887 119875119860119888) It denotes all common individuals

shared among 119875119860119886 119875119860119887 and 119875

119860119888 except 119860

(10)119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) It denotes all common indi-

viduals shared among 119875119860119886 119875119860119887 119875119860119888 and 119875

119860119889 except 119860

(11) Crossover and 2-Overlap Individual If 119904 isin 119861119894 119862(119875119860119886 119875119860119887)

we call 119904 a crossover individual with respect to 119875119860119886

and 119875119860119887

ifthe two paths pass through different parents of 119904 On the otherhand if 119875

119860119886and 119875

119860119887pass through the same parent of 119904 then

we call 119904 a 2-overlap individual with respect to 119875119860119886

and 119875119860119887

(12) 3-Overlap Individual If 119904 isin 119879119903119894 119862(119875119860119886 119875119860119887 119875119860119888) and the

three paths 119875119860119886 119875119860119887 and 119875

119860119888pass through the same parent

of 119904 then we call 119904 a 3-overlap individual with respect to 119875119860119886

119875119860119887 and 119875

119860119888

(13) 2-Overlap Path If 119904 is a 2-overlap individual with respectto 119875119860119886

and 119875119860119887 then both 119875

119860119886and 119875119860119887

pass through the sameparent of 119904 denoted by 119901 and the edge from 119901 to 119904 is called anoverlap edge All consecutive overlap edges constitute a pathand this path is called a 2-overlap path If the 2-overlap path

4 Computational and Mathematical Methods in Medicine

Mat

erna

lPa

tern

al

Δ1 Δ2 Δ3 Δ4 Δ5 Δ6 Δ7 Δ8 Δ9

arsquos allelesbrsquos alleles

Figure 1 The 15 possible identity states for individuals 119886 and 119887 grouped by their 9 condensed states Lines indicate alleles that are IBD

A

c s d

e f

t

a b

Non-overlapping path-pair

Three independent paths

t is a crossover individual

and the overlap path is a root 2-overlap path

t is a 2-overlap individual and e is acrossover individual

t is a crossover individual s is a 2-overlapindividual and the overlap path is a root 2-overlap path

overlap individuals and the overlap path is a root 2-overlap path

e is a crossover individual t is a 2-overlapindividual and the overlap path is not a root 2-overlap path c is a 2-overlap individual and theoverlap path is a root 2-overlap path

Path-triple6

t is a crossover individual

s e t are 2-overlap individuals

c is a 3-overlap individual and e t are 2-

A rarr s rarr e rarr t rarr aA rarr s rarr e rarr t rarr b

A rarr s rarr e rarr t rarr aA rarr drarr b

A rarr s rarr e rarr t rarr aA rarrA rarr c

A rarr c

A rarr c

Path-pair1

Path-pair2

A rarr d rarr f rarr t rarr bA rarr s rarr e rarr t rarr a

A rarr s rarr e rarr t rarr aA rarr s rarr e rarr t rarr b

d rarr f

A rarr s rarr e rarr t rarr aA rarr d rarr f rarr t rarr b

A rarr c rarr t rarr e rarr aA rarr d rarr f rarr t rarr b

A rarr s rarr e rarr t rarr aA rarr s rarr f rarr t rarr bA rarr c

A rarr c rarr e rarr t rarr aA rarr c rarr e rarr t rarr bA rarr c

A rarr c rarr e rarr t rarr aA rarr c rarr e rarr t rarr bA rarr c

Path-triple1

Path-triple2

Path-triple3

Path-triple4

Path-pair3

Path-pair4

Path-triple5

s e t are 2-overlap individualswhere

where

where

where

where

where

where

where

Figure 2 Examples of path-pairs and path-triples

extends all theway to the ancestor119860 we call it a root 2-overlappath

(14) 3-Overlap PathIt consists of all 3-overlap individuals ina consecutive order If the 3-overlap path extends all the wayto the root 119860 we call it a root 3-overlap path

Example 1 Consider the path-pairs from 119860 to 119886 and 119887 inFigure 2 where119860 is a common ancestor of 119886 and 119887 For path-pair1 119861119894 119862(119875

119860119886 119875119860119887) = 119904 119890 119905 and 119860 rarr 119904 rarr 119890 rarr 119905 is

a root 2-overlap path with respect to 119875119860119886

and 119875119860119887 For path-

pair4 119861119894 119862(119875119860119886 119875119860119887) = 119890 119905 where 119890 is a crossover indi-

vidual 119905 is a 2-overlap individual with respect to 119875119860119886

and 119875119860119887

and 119890 rarr 119905 is a root 2-overlap path with respect to 119875119860119886

and119875119860119887

Example 2 There are four path-quads listed in Figure 3 from119860 to four individuals 119886 119887 119888 and 119889 where 119860 is a quad-common ancestor of the four individuals For path-quad2considering the paths 119875

119860119886and 119875119860119887 the path119860 rarr 119905 rarr 119891 rarr

119904 is a root 2-overlap path 119905 119891 119904 are 2-overlap individualswithrespect to 119875

119860119886and 119875

119860119887 For path-quad3 119905 119891 119904 are 3-overlap

individuals with respect to 119875119860119886 119875119860119887 and 119875

119860119888 and the path

119860 rarr 119905 rarr 119891 rarr 119904 is a root 3-overlap path

Then we summarize all the conceptual terms used in thepath-counting formulas for two individuals three individu-als and four individuals in Table 1 which reveals a glimpse ofour framework for generalizingWrightrsquos formula to three andfour individuals from terminology aspect

24 An Overview of Path-Counting Formula DerivationAccording to Wrightrsquos path-counting formula [16] (see (2))for two individuals 119886 and 119887 the path-counting approachrequires identifying common ancestors of 119886 and 119887 andcalculating the contribution of each common ancestor toΦ119886119887 More specifically for each common ancestor denoted

as 119860 we obtain all path-pairs from 119860 to 119886 and 119887

and identify acceptable path-pairs For Φ119886119887 an acceptable

path-pair ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair where

Computational and Mathematical Methods in Medicine 5

A

c

s

dt

f

ba

m

Path-quad1

Path-quad2

Path-quad3

Path-quad4

A rarr cA rarr d

A rarr t rarr f rarr s rarr aA rarr m rarr s rarr b

A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr cA rarr d

A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr t rarr f rarr s rarr cA rarr d

A rarr t rarr f rarr s rarr aA rarr t rarr m rarr s rarr bA rarr t rarr m rarr s rarr cA rarr d

Figure 3 Examples of path-quads

Table 1 The conceptual terms used for two three and four individuals

Two individuals Three individuals Four individualsCommon ancestor Triple-common ancestor Quad-common ancestorPath-pair Path-triple Path-quad119861119894 119862(119875

119860119886 119875119860119887) 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) 119876119906119886119889 119862(119875

119860119886 119875119860119887 119875119860119888 119875119860119889)

NA 2-Overlap individual 3-Overlap individualNA 2-Overlap path 3-Overlap pathNA Root 2-overlap path Root 3-overlap pathNA Crossover individual Crossover individual

the two paths share no common individuals except 119860 InFigure 2 path-pair2 is an acceptable path-pair while path-pair1 path-pair3 and path-pair4 are not acceptable path-pairs The contribution of each common ancestor 119860 toΦ

119886119887is

computed based on the inbreeding coefficient of 119860 modifiedby the length of each acceptable path-pair

To compute Φ119886119887119888

the path-counting approach requiresidentifying all triple-common ancestors of 119886 119887 and 119888 andsumming up all triple-common ancestorsrsquo contributions toΦ119886119887119888

For each triple-common ancestor denoted as119860 we firstidentify all path-triples each of which consists of three pathsfrom 119860 to 119886 119887 and 119888 respectively Some examples of path-triples are presented in Figure 2

For Φ119886119887 only nonoverlapping path-pairs are acceptable

A path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three path-pairs

⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩ For Φ

119886119887119888 a path-triple

might be acceptable even though either 2-overlap individualsor crossover individuals exist between a path-pair Themain challenge we need to address is finding necessary andsufficient conditions for acceptable path-triples

Aiming at solving the problem of identifying acceptablepath-triples we first use a systematic method to generate allpossible cases for a path-pair by considering different types ofcommon individuals shared between the two pathsThen weintroduce building blocks which are connected graphs withconditions on every edge in the graph that encapsulates a

set of acceptable cases of path-pairs In each building blockwe represent paths as nodes and interactions (ie sharedcommon individuals between two paths) as edges There areat least two paths in a building block For each buildingblock we obtain all acceptable cases for concerned path-pairs Given a path-triple it can be decomposed to one ormultiple building blocks Considering a shared path-pairbetween two building blocks we use the natural join operatorfrom relational algebra to match the acceptable cases forthe shared path-pair between two building blocks In otherwords considering the acceptable cases for building blocksas inputs we use the natural join operator to construct allacceptable cases for a path-triple Acceptable cases for a path-triple are identified and then used in deriving the path-counting formula forΦ

119886119887119888

Then we summarize all the main procedures used forderiving the path-counting formula for Φ

119886119887119888in a flowchart

shown in Figure 4 The main procedures are also applicablefor deriving the path-counting formulas forΦ

119886119887119888119889andΦ

119886119887119888119889

3 Results and Discussion

31 Path-Counting Formulas for Three Individuals We firstintroduce a systematic method to generate all possible cases

6 Computational and Mathematical Methods in Medicine

Path-pair

Path-triple Path-pair levelrepresentation Decomposition A set of

building blocksSets of acceptable casesFor each building block

Acceptable cases forpath-triple Natural join

If path-pair hascrossover

No

No

Yes

Yes

Split operator

Path-triple belongs toType 2

Type 1

If path-pair hasroot overlap

Compute its contributionto Φabc

Path-triple belongs to

⟨PAa PAb⟩Generate all cases for Identify nonoverlap path-

Pairs for ⟨PAa PAb⟩Compute its contribution

to Φab

Identify acceptable cases⟨PAa PAb⟩ in thefor

context of a path-triple

Aa PAb PAc ⟩⟨P

⟨PAa PAb⟩

Figure 4 A flowchart for path-counting formula derivation

for a path-pair Then we discuss building blocks for path-triples and identify all acceptable cases which are used inderiving the path-counting formula forΦ

119886119887119888

311 Cases for a Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩with

119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 where 119860 is a common ancestor of 119886

and 119887 and 119861119894 119862(119875119860119886 119875119860119887) consists of all common individuals

shared between 119875119860119886

and 119875119860119887 except 119860 we introduce three

patterns (ie crossover 2-overlap and root 2-overlap) to gen-erate all possible cases for ⟨119875

119860119886 119875119860119887⟩

(1) 119883(119875119860119886 119875119860119887) 119875119860119886

and 119875119860119887

share one or multiple cross-over individuals

(2) 119879(119875119860119886 119875119860119887) 119875119860119886

and 119875119860119887

are root 2-overlapping from119860 and the root 2-overlap path can have one or multi-ple 2-overlap individuals

(3) 119884(119875119860119886 119875119860119887)119875119860119886

and119875119860119887

are overlapping but not from119860 and the 2-overlap path can have one or multiple 2-overlap individuals

Based on the three patterns 119883(119875119860119886 119875119860119887) 119879(119875

119860119886 119875119860119887)

and 119884(119875119860119886 119875119860119887) we use regular expressions to generate all

possible cases for the path-pair ⟨119875119860119886 119875119860119887⟩ For convenience

we drop ⟨119875119860119886 119875119860119887⟩ and use 119883119879 and 119884 instead of patterns

119883(119875119860119886 119875119860119887) 119879(119875

119860119886 119875119860119887) and 119884(119875

119860119886 119875119860119887) whenever there is

no confusion When 119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 the eight cases

shown in (7) cover all possible cases for ⟨119875119860119886 119875119860119887⟩ The com-

pleteness of eight cases shown in (7) for ⟨119875119860119886 119875119860119887⟩ can be

proved by induction on the total number of 119879 119883 and 119884appearing in ⟨119875

119860119886 119875119860119887⟩ Using the pedigree in Figure 2 Cases

1ndash3 and Case 6 are illustrated in (8) (9) (10) and (11)

Case 1 119879Case 2 119883+

Case 3 119879119883+

Case 4 119879(119883+119884)+

Case 5 119879(119883+119884)+119883+

Case 6 119883+119884Case 7 119883+(119884119883+)+

Case 8 119883+(119884119883+)+119884

(7)

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119879 (8)

Computational and Mathematical Methods in Medicine 7

S0 S1 S2 S3

PAa PAb

PAc

Figure 5 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888⟩

where 119904 119890 119905 are 2-overlap individuals and the overlap pathis a root 2-overlap path

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119879119883 (9)

where 119904 is a 2-overlap individual and the overlap path is a root2-overlap path 119905 is a crossover individual

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119889 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119883 (10)

where 119905 is a crossover individual

119860 997888rarr 119888 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119883119884 (11)

where 119890 is a crossover individual 119905 is a 2-overlap individualand the overlap path is a 2-overlap path

312 Path-Pair Level Graphical Representation of a Path-Tri-ple Given a path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ we represent each

path as a node The path-triple can be decomposed to threepath-pairs (ie ⟨119875

119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩) For

each path-pair if the two paths share at least one commonindividual (ie either 2-overlap individual or crossover indi-vidual) except119860 then there is an edge between the two nodesrepresenting the two paths Therefore we obtain four differ-ent scenarios 119878

0ndash1198783 shown in Figure 5

In Figure 5 the scenario 1198780has no edges so it means

that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent paths In

Figure 2 path-triple1 is an example of 1198780 Next we introduce

a lemma which can assist with identifying the options for theedges in the scenarios 119878

1ndash1198783

Lemma 3 Given a path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consider the

three path-pairs ⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩ if there

is a 2-overlap edge which is represented by 119884 in regular expres-sion representation of any of the three path-pairs and then thepath-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ has no contribution to Φ

119886119887119888

Proof In [17] Nadot and Vaysseix proposed from a geneticand biological point of view that Φ

119886119887119888can be evaluated by

enumerating all eligible inheritance paths at allele-level start-ing from a triple common ancestor119860 to the three individuals119886 119887 and 119888

p1

p3

A

b c

a

p2

p5

p8

p4

p7

p6

(a) Pedigree

A

b c

a

p5

p7

p4

p6

p8

p1 p2

p3

(b) Inheritance paths

Figure 6 Examples of pedigree and inheritance paths

For the pedigree in Figure 6 let us consider the path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ listed as follows 119875

119860119886 119860 rarr 119886 119875

119860119887

119860 rarr 1199013rarr 1199016rarr 1199017rarr 119887 119875

119860119888 119860 rarr 119901

4rarr 1199016rarr

1199017rarr 119888For ⟨119875

119860119887 119875119860119888⟩ 1199016is a crossover individual 119901

7is an over-

lap individual and 1199016rarr 1199017is a 2-overlap edge repre-sented

by 119884 in regular expression representation (see the definitionfor 119884 in Section 311)

For the individual 1199016 let us denote the two alleles at one

fixed autosomal locus as 1198921and 119892

2 At allele-level only one

allele can be passed down from 1199016to 1199017 Since 119901

3and 119901

4

are parents of 1199016 1198921is passed down from one parent and

1198922is passed down from the other parent It is infeasible to

pass down both 1198921and 119892

2from 119901

6to 1199017 In other words

there are no corresponding inheritance paths for the path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩with a 2-overlap edge between ⟨119875

119860119887 119875119860119888⟩

(ie Case 6119883119884) Therefore such kind of path-triples has nocontribution toΦ

119886119887119888

Figure 6(b) shows one example of eligible inheritancepaths corresponding to a pedigree graph Each individual isrepresented by two allele nodesThe eligible inheritance pathsin Figure 6(b) consist of red edges only

Only Case 1 Case 2 and Case 3 do not have 119884 in theregular expression representation of a path-pair (see (7))considering the scenarios 119878

1ndash1198783shown in Figure 5 an edge

can have three options Case 1 119879Case 2 119883Case 3 119879119883

313 Constructing Cases for a Path-Triple For the scenarios1198781ndash1198783in Figure 5 we define two building blocks 119861

1 1198612

along with some rules in Figure 7 to generate acceptablecases For 119861

1 the edge can have three options Case 1 119879

Case 2 119883 Case 3 119879119883 For 1198612 we cannot allow both edges

to be root overlap because if two edges are root overlap then

8 Computational and Mathematical Methods in Medicine

For B2 there can be at most one edge belonging to root overlap (either T or TX)

PAa PAa

PAb PAb PAc

B1 B2

For B1 the edge can have three options case 1 T case 2 X case 3 TX

Figure 7 Building blocks 1198611 1198612 and basic rules

Note Ri denotes all acceptable path-triples for ui

S3e1

T3 = R1 ⋈ R2 ⋈ R3u1 u2 u3

e2 e2 e2

e3e3 e3e1 e1

Figure 8 A graphical illustration for obtaining 1198793

119875119860119886

and 119875119860119888

must share at least one com-mon individualexcept 119860 which contradicts the fact that 119875

119860119886and 119875

119860119888have

no edgeNext we focus on generating all acceptable cases for the

scenarios 1198781ndash1198783in Figure 5 where only 119878

3contains more

than one building block In order to leverage the dependencyamong building blocks we decompose 119878

3to 1198783= 1199061= 1198612

1199062= 1198612 1199063= 1198612 shown in Figure 8 For each 119906

119894 we have a

set of acceptable path-triples denoted as 119877119894

Considering the dependency among 1198771 1198772 1198773 we use

the natural join operator denoted as ⋈ operating on 1198771

1198772 1198773 to generate all acceptable cases for 119878

3 As a result we

obtain 1198793= 1198771⋈ 1198772⋈ 1198773 where 119879

3denotes the acceptable

cases of the path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in the scenario 119878

3

For each scenario in Figure 5 we generate all acceptablecases for ⟨119875

119860119886 119875119860119887 119875119860119888⟩ The scenario 119878

0has no edges and

it shows that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent

paths while for the other scenarios 119878119896(119896 = 1 2 3) the 119896

edges can have two options

(1) all 119896 edges belong to crossover or(2) one edge belongs to root 2-overlap the remaining (119896minus

1) edges belong to crossover

In summary acceptable path-triples can have at most oneroot 2-overlap path any number of crossover individuals butzero 2-overlap path

314 Splitting Operator Considering the existence of root2-overlap path and crossover in acceptable path-triples wepropose a splitting operator to transform a path-triple withcrossover individuals to a noncrossover path-triple withoutchanging the contribution from this path-triple to Φ

119886119887119888 The

main purpose of using the splitting operator is to simplifythe path-counting formula derivation process We first usean example in Figure 9 to illustrate how the splitting operator

works In Figure 9 there is a crossover individual 119904 between119875119860119886

and 119875119860119887

in the path triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in 119866

119896+1 The

splitting operator proceeds as follows

(1) split the node 119904 to two nodes 1199041and 1199042

(2) transform the edges 119904 rarr 1198861015840 and 119904 rarr 119887

1015840 to 1199041rarr 1198861015840

and 1199042rarr 1198871015840 respectively

(3) add two new edges 1199042rarr 1198861015840 and 119904

1rarr 1198871015840

Lemma 4 Given a pedigree graph 119866119896+1

having (119896 + 1)

crossover individuals regarding ⟨119875119860119886 119875119860119887 119875119860119888⟩ shown in

Figure 9 let 119904 denote the lowest crossover individual where nodescendant of 119904 can be a crossover individual among the threepaths119875

119860119886119875119860119887 and119875

119860119888 After using the splitting operator for the

lowest crossover individual 119904 in119866119896+1 the number of crossover

individuals in 119866119896+1

is decreased by 1

Proof The splitting operator only affects the edges from 119904 to1198861015840 and 1198871015840 If there is a new crossover node appearing the only

possible node is either 1198861015840 or 1198871015840 Assume 1198871015840 becomes a cross-over individual it means that 1198871015840 is able to reach 119886 and 119887 fromtwo separate paths It contradicts the fact that 119904 is the lowestcrossover individual between 119875

119860119886and 119875

119860119887

Next we introduce a canonical graph which results fromapplying the splitting operator for all crossover individualsThe canonical graph has zero crossover individual

Definition 5 (Canonical Graph) Given a pedigree graph 119866having one or more crossover individuals regarding Φ

119886119887119888 If

there exists a graph 1198661015840 which has no crossover individualswith regards to Φ

119886119887119888such that

(i) any acceptable path-triple in 119866 has an acceptablepath-triple in 1198661015840 which has the same contribution toΦ119886119887119888

as the one in 119866 forΦ119886119887119888

(ii) any acceptable path-triple in 1198661015840 has an acceptablepath-triple in 119866 which and has the same contributionto Φ119886119887119888

as the one in 1198661015840 forΦ119886119887119888

We call 1198661015840 a canonical graph of 119866 regardingΦ119886119887119888

Lemma 6 For a pedigree graph 119866 having one or morecrossover individuals regarding ⟨119875

119860119886 119875119860119887 119875119860119888⟩ there exists a

canonical graph 1198661015840 for 119866

Computational and Mathematical Methods in Medicine 9

Ancestor-descendant relationshipParent-child relationship

a998400 b

a b a b

998400 a998400 b998400

s1 s2

A A

x w c x w c

s For Gk+1 ⟨P ⟩ = PAa PAb PAc

⟨P ⟩ = PAa PAb PAcFor Gk

Gk+1 k + 1 crossover Gk k crossover

A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b

A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b

A rarr c

A rarr c

Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866

119896having 119896 crossover

S0

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

PAa PAd

PAb PAc

Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Proof (Sketch) The proof is by induction on the number ofcrossover individuals

Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866

In the induction step let119866119896+1

be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875

119860119886and

119875119860119887

in 119866119896+1

We apply the splitting operator on 119904 in 119866119896+1

andobtain 119866

119896having 119896 crossovers by Lemma 4

315 Path-Counting Formula for Φ119886119887119888

Now we present thepath-counting formula forΦ

119886119887119888

Φ119886119887119888= sum

119860

( sum

Type 1(1

2)

119871 triple

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple+1

Φ119860119860)

(12)

where Φ119860119860= (12)(1 + 119865

119860) Φ119860119860119860

= (14)(1 + 3119865119860) 119865119860 the

inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875

119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type

2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875

119860119904ending at

the individual 119904

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 2(13)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119886

119875119860119888 and 119875

119860119904)

For completeness the path-counting formula for Φ119886119886119887

isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B

32 Path-Counting Formulas for Four Individuals

321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and

119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11

scenarios 1198780ndash11987810shown in Figure 10 where all four paths are

considered symmetricallyIn Figure 11 we introduce three building blocks 119861

1

1198612 1198613 For 119861

1and 119861

2 the rules presented in Figure 7 are also

applicable for Figure 11 For1198613 we only consider root overlap

because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the

scenario 1198783in Figure 8 Therefore we only need to consider

1198613when 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0

322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ For a scenario 119878

119894(0 le 119894 le 10) in Figure 11 we

first decompose 119878119894to one or multiple building blocks For a

scenario 119878119894isin 1198781 1198783 it has only one building block and

all acceptable cases can be obtained directly For 1198782= 1199061=

1198611 1199062= 1198611 there is no need to consider the conflict between

the edges in 1199061and 119906

2because 119906

1and 119906

2are disconnected

Let 119877119894denote all acceptable cases of the path-pairs in 119906

119894 and

let 119879119894denote all acceptable cases for 119878

119894 Therefore we obtain

1198792= 1198771times1198772where times denotes the Cartesian product operator

from relational algebra

10 Computational and Mathematical Methods in Medicine

For B3 all three edges belong to root overlap (ie having root 3-overlap)

PAa

PAb PAcPAb

PAa

C(PAa PAb PAc) ne

B1 B2 B3

Tri 0

Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)

119878119894

1198784

1198785

1198787

1198788

1198789

11987810

119878119895

1198783

1198783

1198786

1198785

1198787

1198789

For 1198786= 1199061= 1198613 we obtain 119879

6= 1198771 For 119878

119894isin 119878119894| 4 le

119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based

on which we construct 119879119894

Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le

10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878

119895 is

defined as follows

(1) 119878119895is a proper subgraph of 119878

119894

(2) if 119878119894contains 119861

3 then 119878

119895must also contain 119861

3

(3) no such 119878119896exists that 119878

119895is a proper subgraph of 119878

119896

while 119878119896is also a proper subgraph of 119878

119894

For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the

largest subgraph of 119878119894 denoted as 119878

119895 in Table 2

For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878

119894 119878119895)

denote the set of building blocks in 119878119894but not in 119878

119895 where 119878

119895is

the largest subgraph of 119878119894 Let |119864

119894| and |119864

119895| denote the number

of edges in 119878119894and 119878

119895 respectively According to Table 2 we

can conclude that |119864119894| minus |119864

119895| = 1 In order to leverage the

dependency among building blocks we consider only 1198612in

Diff(119878119894119878119895) For example Diff(119878

51198783) = 119861

2 Let119879

3denote all

acceptable cases for 1198783 And let119877

1denote the set of acceptable

cases for Diff(1198785 1198783) Then we can use 119878

3and Diff(119878

5

1198783) to construct all acceptable cases for 119878

5 Then we apply

this idea for constructing all acceptable cases for each 119878119894in

Table 2Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case

has the following properties

(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path

(2) otherwise there can be at most two root 2-overlappaths

323 Path-Counting Formula forΦ119886119887119888119889

Now we present thepath-counting formula forΦ

119886119887119888119889as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

119871quad

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad+1

Φ119860119860119860

+ sum

Type 3(1

2)

119871quad+2

Φ119860119860)

(14)

where Φ119860119860= (12)(1+119865

119860)Φ119860119860119860

= (14)(1+3119865119860)Φ119860119860119860119860

=

(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-

common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904

ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path119875119860119905

ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap

path 119875119860119905

ending at 119904 and 119905respectively

119871quad =

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119904

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860119905for Case 2 isin Type 3

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus119871119875119860119905minus 119871119875119860119904

for Case 3 isin Type 3(15)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119887

119875119860119888 119875119860119889 etc)

For completeness the path-counting formulas for Φ119886119886119887119888

and Φ119886119886119886119887

are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C

Computational and Mathematical Methods in Medicine 11

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

s t

da

A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d

(a)

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

x y

da

A rarr x rarr a

A rarr x rarr d

A rarr y rarr bA rarr y rarr c

(b)

Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889

33 Path-Counting Formulas for Two Pairs of Individuals

331 Terminology and Definitions

(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875

119878119886isin 119875(119878 119886) 119875

119878119887isin

119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875

119879119889isin 119875(119879 119889) 119878 is a common ancestor

of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889

(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875

119860119886 119875119860119887) (or

119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when

119875119860119886

and 119875119860119887

(or 119875119860119888

and 119875119860119889) pass through the same parent of

119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call

119903 a heter-overlap individual when 119875119860119894

and 119875119860119895

pass throughthe same parent of 119903

(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap

individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path

Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875

119860119886

and 119875119860119887

119905 is a homo-overlap individual between 119875119860119888

and 119875119860119889 And

119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875

119860119886and 119875

119860119889 119910 is a heter-

overlap individual between 119875119860119887

and 119875119860119888 And 119860 rarr 119909 and

119860 rarr 119910 are root heter-overlap paths

332 Path-Counting Formula for Φ119886119887119888119889

Now we presenta path-pair level graphical representation for ⟨(119875

119860119886 119875119860119887)

(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can

be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩

presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are

summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875

119860ℎ(ie the path 119875

119860ℎending at ℎ) and 119903

1

and 1199032are the last individuals of root heter-overlap paths 119875

1198601199031

and 1198751198601199032

respectivelyGiven a pedigree graph having one or multiple progeni-

tors 119901119894| 119894 gt 0 we define that the generation of a progenitor

Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

Zero root 2-overlap andzero root 3-overlap

Zero root homo-overlap and zero rootheter-overlap

One root 2-overlap path

One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap

Two root 2-overlap paths

Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps

One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

One root 2-overlap andone root 3-overlap

One root homo-overlap and two rootheter-overlaps and 119903

1= 1199032= ℎ

One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

119901119894is 0 denoted as gen(119901

119894) = 0 If an individual 119886 has only

one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1

The path-counting formula forΦ119886119887119888119889

is as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

1198712-pair

Φ119860119860119860

+ sum

Type 2(1

2)

1198712-pair+1

Φ119860119860119860

+ sum

Type 3(1

2)

1198712-pair+2

Φ119860119860

+ sum

Type 4(1

2)

1198712-pair+1

Φ119860119860)

+ sum

(119878119879)isinType 5(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

⟩+1

Φ119861119861

(16)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =

119860) there are four types (ieType 1 to Type 4)

12 Computational and Mathematical Methods in Medicine

S0S1 S2 S3 S4 S5 S6 S7

S8 S9 S10 S11 S12 S13 S14 S15 S16

PAa

PAdPAb

PAc

Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level

Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875

119860119903ending at 119903

Type 3

zero root homo-overlap and two rootheter-overlap 119875

1198601199031and1198751198601199032

ending at1199031and 1199032 respectively

one root homo-overlap 119875119860ℎ

ending at ℎand two root heter-overlap 119875

1198601199031and 119875

1198601199032

ending at 1199031and 1199032 and 119903

1= 1199032

(17)

Type 4 one root homo-overlap 119875119860ℎ

ending at ℎ andtwo root heter-overlap ending at 119903

1and 1199032 and ℎ =

1199031= 1199032 For ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is

one type (ie Type 5)Type 5 ⟨119875

119878119886 119875119878119887⟩ has zero overlap individual ⟨119875

119879119888

119875119879119889⟩ has zero overlap individual

At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875

119879119888

119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875

119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩

there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879

119861=

119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)

and 119879 has two parents119879 otherwise

1198712-pair =

119871119875119860119886+ 119871119875119860119887

+119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119903

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 1198711198751198601199031

minus 1198711198751198601199032

for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860ℎfor Type 4

119871⟨119875119878119886 119875119878119887⟩

= 119871119875119878119886+ 119871119875119878119887

for Type 5

119871⟨119875119879119888 119875119879119889⟩

= 119871119875119879119888+ 119871119875119879119889

for Type 5

(18)

Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ

119886119887119888119889

Φ119886119887119888119889

= sum

(119878119879)isinType 6(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

Φ119878119878lowast Φ119879119879 (19)

Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875

119879119888

119875119879119889⟩ is a nonoverlapping path-pair Between a path from

⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩ there are no overlap

individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩

and 119871⟨119875119879119888119875119879119889⟩

are defined as in Type 5The correctness of the path-counting formula forΦ

119886119887119888119889is

proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ

119886119886119887119888 Φ119886119887119886119888

Φ119886119887119886119887

andΦ119886119886119886119887

34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method

Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows

(1) If 119906 is 119903 then NC(119903) contains only one element theempty string

(2) Otherwise let 119906 be a node with NC(119906) and V0 V1

V119896be 119906rsquos children in sibling order then for each 119909

in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le

119894 le 119896 and lowast indicates the gender of the individualrepresented by node V

119894

Computational and Mathematical Methods in Medicine 13

Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33

In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group

We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement

In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients

In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree

Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130

4 Conclusion

We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients

0

50

100

150

200

77 181

383

769

1558

3105

6174

1235

1

2466

7

4976

1

9832

8

1951

97

250

300

Aver

age t

ime (

ms)

Individuals in pedigree

RecursiveNodecodes

Figure 14 The effect of pedigree size on computation efficiencyimprovement

0200400600800

10001200140016001800

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Aver

age t

ime (

ms)

Depth

RecursiveNodeCodes

Figure 15 The effect of depth on computation efficiency improve-ment

we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees

Appendices

A Path-Counting Formulas of Special Cases

A1 Path-Counting Formula for Φ119886119886119887

For ⟨1198751198601198861 1198751198601198862⟩ we

introduce a special case where 1198751198601198861

and 1198751198601198862

aremergeable

14 Computational and Mathematical Methods in Medicine

PAa1 PAa2 PAa1 PAa2

S0 S1

PAb PAb PAb

If is mergeable⟨P ⟩Aa1 PAa2

PAa

S2 S3

Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩

Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861

1198751198601198862⟩ is mergeable if and only if the two paths 119875

1198601198861and 119875

1198601198862

are completely identical

Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩ in Figure 16

Lemma A2 For 1198782and 119878

3in Figure 16 ⟨119875

1198601198861 1198751198601198862⟩ cannot

be a mergeable path-pair

Proof For 1198782and 119878

3 if ⟨119875

1198601198861 1198751198601198862⟩ is mergeable then

any common individual 119904 between 1198751198601198861

and 119875119860119887

is alsoa shared individual between 119875

1198601198862and 119875

119860119887 It means

119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that

119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0

Considering all three scenarios in Figure 16 only 1198781can

have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now

we present our path-counting formula forΦ119886119886119887

where 119886 is notan ancestor of 119887

Φ119886119886119887

= sum

119860

( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860

+ sum

Type 3(1

2)

119871⟨119875119860119886119875119860119887⟩+1

Φ119860119860)

(A1)

where 119860 a common ancestor of 119886 and 119887When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap

Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at the individual 119904

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair

119871 triple = 1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

for Type 3

(A2)

For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ

119886119886119887in [10] but we can use either

the recursive formula for Φ119886119887119888

or the path-counting formulaforΦ119886119887119888

to computeΦ11988611198862119887

A2 Path-Counting Formula for Φ119886119886119887119888

Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable then

we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888

119875119860119889⟩ If ⟨119875

1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875

1198601198861 1198751198601198862

119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875

119860119886 119875119860119887 119875119860119888⟩

Now we present a path-counting formula forΦ119886119886119887119888

where119886 is not an ancestor of 119887 and 119888 as follows

Φ119886119886119887119888

= sum

119860

( sum

Type 1(1

2)

119871quadminus1

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad

ΦAAA

+ sum

Type 3(1

2)

119871quad+1

Φ119860119860)

+sum

119860

( sum

Type 4(1

2)

119871 triple+1

Φ119860119860119860

+ sum

Type 5(1

2)

119871 triple+2

Φ119860119860)

(A3)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875

119860119904ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

and 1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path 119875119860119905

ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904

and 119875119860119905

ending at 119904 and 119905respectively

(A4)

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path

Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875

119860119904

ending at 119904

119871quad=

1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119904

for Type 21198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905

for Case 2isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905minus 119871119875119860119904

for Case 3isinType 3

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 5(A5)

Computational and Mathematical Methods in Medicine 15

Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ

119886119887119888119889is applicable

to computeΦ11988611198862119887119888

A3 Path-Counting Formula for Φ119886119886119886119887

A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced

when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of

a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-

densed to ⟨119875119860119886 119875119860119887⟩

Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861

1198751198601198862

and 1198751198601198863

they are mergeable if and only if theyare completely identical

Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there

must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩

⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩

Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one

mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and

⟨1198751198601198862 1198751198601198863⟩

For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable

path-pairNow we present the path-counting formula for Φ

119886119886119886119887

where 119886 is not an ancestor of 119887 as follows

Φ119886119886119886119887

= sum

119860

(3

2( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860)

+ sum

Type 3(1

2)

119871pair+2

Φ119860119860)

(A6)

where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-

sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)

Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path

Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at 119904

When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping

119871 triple = 1198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3

(A7)

Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887

=

Φ119886111988621198863119887

Then we apply the path-counting formula forΦ119886119887119888119889

to computeΦ119886111988621198863119887

Case21 Case31 ΦAAAΦabCase22 Case32

Case23 ΦAA

Figure 17 Dependency graph for different cases regardingΦ119886119887119888

andΦ119886119886119887

B Proof for Path-Counting Formulas ofThree Individuals

Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ

119886119887119888is equivalent to the

computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors

B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ

119886119887119888andΦ

119886119886119887119866 can

have 5 different cases

Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 23 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

having mergeablepath-pair⟨119875

1198601198861 1198751198601198862⟩

lArr997904 Φ119886119886119887

Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlapCase 32 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlap

lArr997904 Φ119886119887119888

(B1)

Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals

Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)

(i) forΦ119886119887 the correctness of the path-counting formula

(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32

(ii) for Case 23 it has no cycle but only depends on Φ119886119887

Thus we prove the correctness of Case 23 by trans-forming the case toΦ

119886119887

16 Computational and Mathematical Methods in Medicine

a b

c

(a)

A

a b c

(b)

Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another

Parent-child relationshipAncestor-descendant relationship

A

a

s v p

f b c

(a)

Parent-child relationshipAncestor-descendant relationship

c

a

s v

f b

(b)

Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887

(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866

B11 Correctness Proof for Case 31

Case 31 ForΦ119886119887119888

119866 does not have any path triples ⟨119875119860119886 119875119860119887

119875119860119888⟩ with root overlap

Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888

Using the recursive formula (3) to compute Φ119886119887119888

forFigure 18(a) Φ

119886119887119888= (12)Φ

119888119887119888= (12)

2

Φ119888119888119888 for Figure 18(b)

Φ119886119887119888= (12)Φ

119860119887119888= (12)

2

Φ119860119860119888

= (12)3

Φ119860119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the

contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

follows sumType 1(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

where 119871⟨119875119860119886119875119860119887 119875119860119888⟩

=

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

For Figure 18(a) 119888 is the only triple-common ancestor

and we obtain Φ119886119887119888

= (12)119871⟨119875119888119886119875119888119887

119875119888119888⟩Φ119888119888119888

= (12)2

Φ119888119888119888 for

Figure 18(b) we obtain Φ119886119887119888

= (12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

=

(12)3

Φ119860119860119860

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1

For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866

then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888

119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ

119891119887119888= sumType 1(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860

Based on the recursive formula (3)Φ

119886119887119888= (12)(Φ

119891119887119888+Φ119898119887119888)

where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ

119898119887119888= 0 Then we can plug-in the path-

counting formula forΦ119891119887119888

to obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2lowast sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩

Φ119860119860119860

= sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860119860

∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 1(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩

Φ119860119860119860

(B2)

Similarly for Figure 19(b) we obtain Φ119886119887119888

=

sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1

Φ119888119888119888= sumType 1(12)

119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888

Thus it is true for 119899 = 119896 + 1

B12 Correctness Proof for Case 32

Case 32 ForΦ119886119887119888

119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root

overlap

Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888

Computational and Mathematical Methods in Medicine 17

a

b

c

(a)

A

a

b c

(b)

A

a

s

b

c

(c)

Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another

Using the recursive formula (3) to compute Φ119886119887119888

inFigure 20 for Figure 20(a) Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)3

Φ119888119888 for Figure 20(b)Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)4

Φ119860119860

for Figure 20(c)Φ119886119887119888= (12)

2

Φ119904119904119888= (12)

3

Φ119904119888=

(12)5

Φ119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-

tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

followssumType 2(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩+1

Φ119860119860

where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

=

119871119875119860119886

+ 119871119875119860119887

+ 119871119875119860119888minus 119871119875119860119904

and 119904 is the last individual of theroot overlap path 119875

119860119904

For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ

119886119887119888= (12)

119871⟨119875119888119886119875119888119887119875119888119888⟩+1

Φ119888119888= (12)

2+1

Φ119888119888=

(12)3

Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain

Φ119886119887119888= (12)

4

Φ119860119860

and Φ119886119887119888= (12)

5

Φ119860119860

respectively

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1

For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ

119891119887119888in 119866lowast Φ

119891119887119888= sumType 2(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

In 119866 119891 is the only parent of 119886 according to the recursive

formula (3) we have Φ119886119887119888= (12)Φ

119891119887119888 Then we can plug-in

the Φ119891119887119888

and obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩+1

Φ119860119860

(B3)

For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ

119886119887119888for Figure 21(a)

In summary it is true for 119899 = 119896 + 1

A

a

s

t

f

b

c

(a)

a

t

b

A

s c

(b)

a

s

t

b

c

(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887

B13 Correctness Proof for Case 23

Case 23 For Φ119886119886119887

the path-triples in the pedigree graph 119866have mergeable path-pair

Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875

1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means

that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ

119886119886119887can be computed as follows

sumType 3(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

where 119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

Using the recursive formula (4) we obtain Φ

119886119886119887=

(12)(Φ119886119887+ Φ119891119898119887)

For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891

there4 Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

=1

2(Φ119886119887+ 0) =

1

2Φ119886119887

(as 119898 is missing) (B4)

For Φ119886119887 we use Wrightrsquos formula and obtain Φ

119886119887=

sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860

where 119875 denotes all nonoverlappingpath-pairs ⟨119875

119860119886 119875119860119887⟩

Then we have Φ119886119886119887

= (12)Φ119886119887

=

(12)sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

For Figure 22(b) we can also transform the computation

of Φ119886119886119887

to Φ119886119887

In summary it shows that the path-counting formula(A1) is true for Case 23

B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887

whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ

119886119886119887can be

transformed toΦ11988611198862119887

which is equivalent to the computationof Φ119886119887119888

for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ

119886119886119887when the path-triple belongs

to either Case 21 or Case 22

B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 4: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

4 Computational and Mathematical Methods in Medicine

Mat

erna

lPa

tern

al

Δ1 Δ2 Δ3 Δ4 Δ5 Δ6 Δ7 Δ8 Δ9

arsquos allelesbrsquos alleles

Figure 1 The 15 possible identity states for individuals 119886 and 119887 grouped by their 9 condensed states Lines indicate alleles that are IBD

A

c s d

e f

t

a b

Non-overlapping path-pair

Three independent paths

t is a crossover individual

and the overlap path is a root 2-overlap path

t is a 2-overlap individual and e is acrossover individual

t is a crossover individual s is a 2-overlapindividual and the overlap path is a root 2-overlap path

overlap individuals and the overlap path is a root 2-overlap path

e is a crossover individual t is a 2-overlapindividual and the overlap path is not a root 2-overlap path c is a 2-overlap individual and theoverlap path is a root 2-overlap path

Path-triple6

t is a crossover individual

s e t are 2-overlap individuals

c is a 3-overlap individual and e t are 2-

A rarr s rarr e rarr t rarr aA rarr s rarr e rarr t rarr b

A rarr s rarr e rarr t rarr aA rarr drarr b

A rarr s rarr e rarr t rarr aA rarrA rarr c

A rarr c

A rarr c

Path-pair1

Path-pair2

A rarr d rarr f rarr t rarr bA rarr s rarr e rarr t rarr a

A rarr s rarr e rarr t rarr aA rarr s rarr e rarr t rarr b

d rarr f

A rarr s rarr e rarr t rarr aA rarr d rarr f rarr t rarr b

A rarr c rarr t rarr e rarr aA rarr d rarr f rarr t rarr b

A rarr s rarr e rarr t rarr aA rarr s rarr f rarr t rarr bA rarr c

A rarr c rarr e rarr t rarr aA rarr c rarr e rarr t rarr bA rarr c

A rarr c rarr e rarr t rarr aA rarr c rarr e rarr t rarr bA rarr c

Path-triple1

Path-triple2

Path-triple3

Path-triple4

Path-pair3

Path-pair4

Path-triple5

s e t are 2-overlap individualswhere

where

where

where

where

where

where

where

Figure 2 Examples of path-pairs and path-triples

extends all theway to the ancestor119860 we call it a root 2-overlappath

(14) 3-Overlap PathIt consists of all 3-overlap individuals ina consecutive order If the 3-overlap path extends all the wayto the root 119860 we call it a root 3-overlap path

Example 1 Consider the path-pairs from 119860 to 119886 and 119887 inFigure 2 where119860 is a common ancestor of 119886 and 119887 For path-pair1 119861119894 119862(119875

119860119886 119875119860119887) = 119904 119890 119905 and 119860 rarr 119904 rarr 119890 rarr 119905 is

a root 2-overlap path with respect to 119875119860119886

and 119875119860119887 For path-

pair4 119861119894 119862(119875119860119886 119875119860119887) = 119890 119905 where 119890 is a crossover indi-

vidual 119905 is a 2-overlap individual with respect to 119875119860119886

and 119875119860119887

and 119890 rarr 119905 is a root 2-overlap path with respect to 119875119860119886

and119875119860119887

Example 2 There are four path-quads listed in Figure 3 from119860 to four individuals 119886 119887 119888 and 119889 where 119860 is a quad-common ancestor of the four individuals For path-quad2considering the paths 119875

119860119886and 119875119860119887 the path119860 rarr 119905 rarr 119891 rarr

119904 is a root 2-overlap path 119905 119891 119904 are 2-overlap individualswithrespect to 119875

119860119886and 119875

119860119887 For path-quad3 119905 119891 119904 are 3-overlap

individuals with respect to 119875119860119886 119875119860119887 and 119875

119860119888 and the path

119860 rarr 119905 rarr 119891 rarr 119904 is a root 3-overlap path

Then we summarize all the conceptual terms used in thepath-counting formulas for two individuals three individu-als and four individuals in Table 1 which reveals a glimpse ofour framework for generalizingWrightrsquos formula to three andfour individuals from terminology aspect

24 An Overview of Path-Counting Formula DerivationAccording to Wrightrsquos path-counting formula [16] (see (2))for two individuals 119886 and 119887 the path-counting approachrequires identifying common ancestors of 119886 and 119887 andcalculating the contribution of each common ancestor toΦ119886119887 More specifically for each common ancestor denoted

as 119860 we obtain all path-pairs from 119860 to 119886 and 119887

and identify acceptable path-pairs For Φ119886119887 an acceptable

path-pair ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair where

Computational and Mathematical Methods in Medicine 5

A

c

s

dt

f

ba

m

Path-quad1

Path-quad2

Path-quad3

Path-quad4

A rarr cA rarr d

A rarr t rarr f rarr s rarr aA rarr m rarr s rarr b

A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr cA rarr d

A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr t rarr f rarr s rarr cA rarr d

A rarr t rarr f rarr s rarr aA rarr t rarr m rarr s rarr bA rarr t rarr m rarr s rarr cA rarr d

Figure 3 Examples of path-quads

Table 1 The conceptual terms used for two three and four individuals

Two individuals Three individuals Four individualsCommon ancestor Triple-common ancestor Quad-common ancestorPath-pair Path-triple Path-quad119861119894 119862(119875

119860119886 119875119860119887) 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) 119876119906119886119889 119862(119875

119860119886 119875119860119887 119875119860119888 119875119860119889)

NA 2-Overlap individual 3-Overlap individualNA 2-Overlap path 3-Overlap pathNA Root 2-overlap path Root 3-overlap pathNA Crossover individual Crossover individual

the two paths share no common individuals except 119860 InFigure 2 path-pair2 is an acceptable path-pair while path-pair1 path-pair3 and path-pair4 are not acceptable path-pairs The contribution of each common ancestor 119860 toΦ

119886119887is

computed based on the inbreeding coefficient of 119860 modifiedby the length of each acceptable path-pair

To compute Φ119886119887119888

the path-counting approach requiresidentifying all triple-common ancestors of 119886 119887 and 119888 andsumming up all triple-common ancestorsrsquo contributions toΦ119886119887119888

For each triple-common ancestor denoted as119860 we firstidentify all path-triples each of which consists of three pathsfrom 119860 to 119886 119887 and 119888 respectively Some examples of path-triples are presented in Figure 2

For Φ119886119887 only nonoverlapping path-pairs are acceptable

A path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three path-pairs

⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩ For Φ

119886119887119888 a path-triple

might be acceptable even though either 2-overlap individualsor crossover individuals exist between a path-pair Themain challenge we need to address is finding necessary andsufficient conditions for acceptable path-triples

Aiming at solving the problem of identifying acceptablepath-triples we first use a systematic method to generate allpossible cases for a path-pair by considering different types ofcommon individuals shared between the two pathsThen weintroduce building blocks which are connected graphs withconditions on every edge in the graph that encapsulates a

set of acceptable cases of path-pairs In each building blockwe represent paths as nodes and interactions (ie sharedcommon individuals between two paths) as edges There areat least two paths in a building block For each buildingblock we obtain all acceptable cases for concerned path-pairs Given a path-triple it can be decomposed to one ormultiple building blocks Considering a shared path-pairbetween two building blocks we use the natural join operatorfrom relational algebra to match the acceptable cases forthe shared path-pair between two building blocks In otherwords considering the acceptable cases for building blocksas inputs we use the natural join operator to construct allacceptable cases for a path-triple Acceptable cases for a path-triple are identified and then used in deriving the path-counting formula forΦ

119886119887119888

Then we summarize all the main procedures used forderiving the path-counting formula for Φ

119886119887119888in a flowchart

shown in Figure 4 The main procedures are also applicablefor deriving the path-counting formulas forΦ

119886119887119888119889andΦ

119886119887119888119889

3 Results and Discussion

31 Path-Counting Formulas for Three Individuals We firstintroduce a systematic method to generate all possible cases

6 Computational and Mathematical Methods in Medicine

Path-pair

Path-triple Path-pair levelrepresentation Decomposition A set of

building blocksSets of acceptable casesFor each building block

Acceptable cases forpath-triple Natural join

If path-pair hascrossover

No

No

Yes

Yes

Split operator

Path-triple belongs toType 2

Type 1

If path-pair hasroot overlap

Compute its contributionto Φabc

Path-triple belongs to

⟨PAa PAb⟩Generate all cases for Identify nonoverlap path-

Pairs for ⟨PAa PAb⟩Compute its contribution

to Φab

Identify acceptable cases⟨PAa PAb⟩ in thefor

context of a path-triple

Aa PAb PAc ⟩⟨P

⟨PAa PAb⟩

Figure 4 A flowchart for path-counting formula derivation

for a path-pair Then we discuss building blocks for path-triples and identify all acceptable cases which are used inderiving the path-counting formula forΦ

119886119887119888

311 Cases for a Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩with

119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 where 119860 is a common ancestor of 119886

and 119887 and 119861119894 119862(119875119860119886 119875119860119887) consists of all common individuals

shared between 119875119860119886

and 119875119860119887 except 119860 we introduce three

patterns (ie crossover 2-overlap and root 2-overlap) to gen-erate all possible cases for ⟨119875

119860119886 119875119860119887⟩

(1) 119883(119875119860119886 119875119860119887) 119875119860119886

and 119875119860119887

share one or multiple cross-over individuals

(2) 119879(119875119860119886 119875119860119887) 119875119860119886

and 119875119860119887

are root 2-overlapping from119860 and the root 2-overlap path can have one or multi-ple 2-overlap individuals

(3) 119884(119875119860119886 119875119860119887)119875119860119886

and119875119860119887

are overlapping but not from119860 and the 2-overlap path can have one or multiple 2-overlap individuals

Based on the three patterns 119883(119875119860119886 119875119860119887) 119879(119875

119860119886 119875119860119887)

and 119884(119875119860119886 119875119860119887) we use regular expressions to generate all

possible cases for the path-pair ⟨119875119860119886 119875119860119887⟩ For convenience

we drop ⟨119875119860119886 119875119860119887⟩ and use 119883119879 and 119884 instead of patterns

119883(119875119860119886 119875119860119887) 119879(119875

119860119886 119875119860119887) and 119884(119875

119860119886 119875119860119887) whenever there is

no confusion When 119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 the eight cases

shown in (7) cover all possible cases for ⟨119875119860119886 119875119860119887⟩ The com-

pleteness of eight cases shown in (7) for ⟨119875119860119886 119875119860119887⟩ can be

proved by induction on the total number of 119879 119883 and 119884appearing in ⟨119875

119860119886 119875119860119887⟩ Using the pedigree in Figure 2 Cases

1ndash3 and Case 6 are illustrated in (8) (9) (10) and (11)

Case 1 119879Case 2 119883+

Case 3 119879119883+

Case 4 119879(119883+119884)+

Case 5 119879(119883+119884)+119883+

Case 6 119883+119884Case 7 119883+(119884119883+)+

Case 8 119883+(119884119883+)+119884

(7)

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119879 (8)

Computational and Mathematical Methods in Medicine 7

S0 S1 S2 S3

PAa PAb

PAc

Figure 5 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888⟩

where 119904 119890 119905 are 2-overlap individuals and the overlap pathis a root 2-overlap path

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119879119883 (9)

where 119904 is a 2-overlap individual and the overlap path is a root2-overlap path 119905 is a crossover individual

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119889 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119883 (10)

where 119905 is a crossover individual

119860 997888rarr 119888 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119883119884 (11)

where 119890 is a crossover individual 119905 is a 2-overlap individualand the overlap path is a 2-overlap path

312 Path-Pair Level Graphical Representation of a Path-Tri-ple Given a path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ we represent each

path as a node The path-triple can be decomposed to threepath-pairs (ie ⟨119875

119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩) For

each path-pair if the two paths share at least one commonindividual (ie either 2-overlap individual or crossover indi-vidual) except119860 then there is an edge between the two nodesrepresenting the two paths Therefore we obtain four differ-ent scenarios 119878

0ndash1198783 shown in Figure 5

In Figure 5 the scenario 1198780has no edges so it means

that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent paths In

Figure 2 path-triple1 is an example of 1198780 Next we introduce

a lemma which can assist with identifying the options for theedges in the scenarios 119878

1ndash1198783

Lemma 3 Given a path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consider the

three path-pairs ⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩ if there

is a 2-overlap edge which is represented by 119884 in regular expres-sion representation of any of the three path-pairs and then thepath-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ has no contribution to Φ

119886119887119888

Proof In [17] Nadot and Vaysseix proposed from a geneticand biological point of view that Φ

119886119887119888can be evaluated by

enumerating all eligible inheritance paths at allele-level start-ing from a triple common ancestor119860 to the three individuals119886 119887 and 119888

p1

p3

A

b c

a

p2

p5

p8

p4

p7

p6

(a) Pedigree

A

b c

a

p5

p7

p4

p6

p8

p1 p2

p3

(b) Inheritance paths

Figure 6 Examples of pedigree and inheritance paths

For the pedigree in Figure 6 let us consider the path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ listed as follows 119875

119860119886 119860 rarr 119886 119875

119860119887

119860 rarr 1199013rarr 1199016rarr 1199017rarr 119887 119875

119860119888 119860 rarr 119901

4rarr 1199016rarr

1199017rarr 119888For ⟨119875

119860119887 119875119860119888⟩ 1199016is a crossover individual 119901

7is an over-

lap individual and 1199016rarr 1199017is a 2-overlap edge repre-sented

by 119884 in regular expression representation (see the definitionfor 119884 in Section 311)

For the individual 1199016 let us denote the two alleles at one

fixed autosomal locus as 1198921and 119892

2 At allele-level only one

allele can be passed down from 1199016to 1199017 Since 119901

3and 119901

4

are parents of 1199016 1198921is passed down from one parent and

1198922is passed down from the other parent It is infeasible to

pass down both 1198921and 119892

2from 119901

6to 1199017 In other words

there are no corresponding inheritance paths for the path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩with a 2-overlap edge between ⟨119875

119860119887 119875119860119888⟩

(ie Case 6119883119884) Therefore such kind of path-triples has nocontribution toΦ

119886119887119888

Figure 6(b) shows one example of eligible inheritancepaths corresponding to a pedigree graph Each individual isrepresented by two allele nodesThe eligible inheritance pathsin Figure 6(b) consist of red edges only

Only Case 1 Case 2 and Case 3 do not have 119884 in theregular expression representation of a path-pair (see (7))considering the scenarios 119878

1ndash1198783shown in Figure 5 an edge

can have three options Case 1 119879Case 2 119883Case 3 119879119883

313 Constructing Cases for a Path-Triple For the scenarios1198781ndash1198783in Figure 5 we define two building blocks 119861

1 1198612

along with some rules in Figure 7 to generate acceptablecases For 119861

1 the edge can have three options Case 1 119879

Case 2 119883 Case 3 119879119883 For 1198612 we cannot allow both edges

to be root overlap because if two edges are root overlap then

8 Computational and Mathematical Methods in Medicine

For B2 there can be at most one edge belonging to root overlap (either T or TX)

PAa PAa

PAb PAb PAc

B1 B2

For B1 the edge can have three options case 1 T case 2 X case 3 TX

Figure 7 Building blocks 1198611 1198612 and basic rules

Note Ri denotes all acceptable path-triples for ui

S3e1

T3 = R1 ⋈ R2 ⋈ R3u1 u2 u3

e2 e2 e2

e3e3 e3e1 e1

Figure 8 A graphical illustration for obtaining 1198793

119875119860119886

and 119875119860119888

must share at least one com-mon individualexcept 119860 which contradicts the fact that 119875

119860119886and 119875

119860119888have

no edgeNext we focus on generating all acceptable cases for the

scenarios 1198781ndash1198783in Figure 5 where only 119878

3contains more

than one building block In order to leverage the dependencyamong building blocks we decompose 119878

3to 1198783= 1199061= 1198612

1199062= 1198612 1199063= 1198612 shown in Figure 8 For each 119906

119894 we have a

set of acceptable path-triples denoted as 119877119894

Considering the dependency among 1198771 1198772 1198773 we use

the natural join operator denoted as ⋈ operating on 1198771

1198772 1198773 to generate all acceptable cases for 119878

3 As a result we

obtain 1198793= 1198771⋈ 1198772⋈ 1198773 where 119879

3denotes the acceptable

cases of the path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in the scenario 119878

3

For each scenario in Figure 5 we generate all acceptablecases for ⟨119875

119860119886 119875119860119887 119875119860119888⟩ The scenario 119878

0has no edges and

it shows that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent

paths while for the other scenarios 119878119896(119896 = 1 2 3) the 119896

edges can have two options

(1) all 119896 edges belong to crossover or(2) one edge belongs to root 2-overlap the remaining (119896minus

1) edges belong to crossover

In summary acceptable path-triples can have at most oneroot 2-overlap path any number of crossover individuals butzero 2-overlap path

314 Splitting Operator Considering the existence of root2-overlap path and crossover in acceptable path-triples wepropose a splitting operator to transform a path-triple withcrossover individuals to a noncrossover path-triple withoutchanging the contribution from this path-triple to Φ

119886119887119888 The

main purpose of using the splitting operator is to simplifythe path-counting formula derivation process We first usean example in Figure 9 to illustrate how the splitting operator

works In Figure 9 there is a crossover individual 119904 between119875119860119886

and 119875119860119887

in the path triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in 119866

119896+1 The

splitting operator proceeds as follows

(1) split the node 119904 to two nodes 1199041and 1199042

(2) transform the edges 119904 rarr 1198861015840 and 119904 rarr 119887

1015840 to 1199041rarr 1198861015840

and 1199042rarr 1198871015840 respectively

(3) add two new edges 1199042rarr 1198861015840 and 119904

1rarr 1198871015840

Lemma 4 Given a pedigree graph 119866119896+1

having (119896 + 1)

crossover individuals regarding ⟨119875119860119886 119875119860119887 119875119860119888⟩ shown in

Figure 9 let 119904 denote the lowest crossover individual where nodescendant of 119904 can be a crossover individual among the threepaths119875

119860119886119875119860119887 and119875

119860119888 After using the splitting operator for the

lowest crossover individual 119904 in119866119896+1 the number of crossover

individuals in 119866119896+1

is decreased by 1

Proof The splitting operator only affects the edges from 119904 to1198861015840 and 1198871015840 If there is a new crossover node appearing the only

possible node is either 1198861015840 or 1198871015840 Assume 1198871015840 becomes a cross-over individual it means that 1198871015840 is able to reach 119886 and 119887 fromtwo separate paths It contradicts the fact that 119904 is the lowestcrossover individual between 119875

119860119886and 119875

119860119887

Next we introduce a canonical graph which results fromapplying the splitting operator for all crossover individualsThe canonical graph has zero crossover individual

Definition 5 (Canonical Graph) Given a pedigree graph 119866having one or more crossover individuals regarding Φ

119886119887119888 If

there exists a graph 1198661015840 which has no crossover individualswith regards to Φ

119886119887119888such that

(i) any acceptable path-triple in 119866 has an acceptablepath-triple in 1198661015840 which has the same contribution toΦ119886119887119888

as the one in 119866 forΦ119886119887119888

(ii) any acceptable path-triple in 1198661015840 has an acceptablepath-triple in 119866 which and has the same contributionto Φ119886119887119888

as the one in 1198661015840 forΦ119886119887119888

We call 1198661015840 a canonical graph of 119866 regardingΦ119886119887119888

Lemma 6 For a pedigree graph 119866 having one or morecrossover individuals regarding ⟨119875

119860119886 119875119860119887 119875119860119888⟩ there exists a

canonical graph 1198661015840 for 119866

Computational and Mathematical Methods in Medicine 9

Ancestor-descendant relationshipParent-child relationship

a998400 b

a b a b

998400 a998400 b998400

s1 s2

A A

x w c x w c

s For Gk+1 ⟨P ⟩ = PAa PAb PAc

⟨P ⟩ = PAa PAb PAcFor Gk

Gk+1 k + 1 crossover Gk k crossover

A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b

A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b

A rarr c

A rarr c

Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866

119896having 119896 crossover

S0

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

PAa PAd

PAb PAc

Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Proof (Sketch) The proof is by induction on the number ofcrossover individuals

Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866

In the induction step let119866119896+1

be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875

119860119886and

119875119860119887

in 119866119896+1

We apply the splitting operator on 119904 in 119866119896+1

andobtain 119866

119896having 119896 crossovers by Lemma 4

315 Path-Counting Formula for Φ119886119887119888

Now we present thepath-counting formula forΦ

119886119887119888

Φ119886119887119888= sum

119860

( sum

Type 1(1

2)

119871 triple

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple+1

Φ119860119860)

(12)

where Φ119860119860= (12)(1 + 119865

119860) Φ119860119860119860

= (14)(1 + 3119865119860) 119865119860 the

inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875

119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type

2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875

119860119904ending at

the individual 119904

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 2(13)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119886

119875119860119888 and 119875

119860119904)

For completeness the path-counting formula for Φ119886119886119887

isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B

32 Path-Counting Formulas for Four Individuals

321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and

119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11

scenarios 1198780ndash11987810shown in Figure 10 where all four paths are

considered symmetricallyIn Figure 11 we introduce three building blocks 119861

1

1198612 1198613 For 119861

1and 119861

2 the rules presented in Figure 7 are also

applicable for Figure 11 For1198613 we only consider root overlap

because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the

scenario 1198783in Figure 8 Therefore we only need to consider

1198613when 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0

322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ For a scenario 119878

119894(0 le 119894 le 10) in Figure 11 we

first decompose 119878119894to one or multiple building blocks For a

scenario 119878119894isin 1198781 1198783 it has only one building block and

all acceptable cases can be obtained directly For 1198782= 1199061=

1198611 1199062= 1198611 there is no need to consider the conflict between

the edges in 1199061and 119906

2because 119906

1and 119906

2are disconnected

Let 119877119894denote all acceptable cases of the path-pairs in 119906

119894 and

let 119879119894denote all acceptable cases for 119878

119894 Therefore we obtain

1198792= 1198771times1198772where times denotes the Cartesian product operator

from relational algebra

10 Computational and Mathematical Methods in Medicine

For B3 all three edges belong to root overlap (ie having root 3-overlap)

PAa

PAb PAcPAb

PAa

C(PAa PAb PAc) ne

B1 B2 B3

Tri 0

Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)

119878119894

1198784

1198785

1198787

1198788

1198789

11987810

119878119895

1198783

1198783

1198786

1198785

1198787

1198789

For 1198786= 1199061= 1198613 we obtain 119879

6= 1198771 For 119878

119894isin 119878119894| 4 le

119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based

on which we construct 119879119894

Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le

10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878

119895 is

defined as follows

(1) 119878119895is a proper subgraph of 119878

119894

(2) if 119878119894contains 119861

3 then 119878

119895must also contain 119861

3

(3) no such 119878119896exists that 119878

119895is a proper subgraph of 119878

119896

while 119878119896is also a proper subgraph of 119878

119894

For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the

largest subgraph of 119878119894 denoted as 119878

119895 in Table 2

For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878

119894 119878119895)

denote the set of building blocks in 119878119894but not in 119878

119895 where 119878

119895is

the largest subgraph of 119878119894 Let |119864

119894| and |119864

119895| denote the number

of edges in 119878119894and 119878

119895 respectively According to Table 2 we

can conclude that |119864119894| minus |119864

119895| = 1 In order to leverage the

dependency among building blocks we consider only 1198612in

Diff(119878119894119878119895) For example Diff(119878

51198783) = 119861

2 Let119879

3denote all

acceptable cases for 1198783 And let119877

1denote the set of acceptable

cases for Diff(1198785 1198783) Then we can use 119878

3and Diff(119878

5

1198783) to construct all acceptable cases for 119878

5 Then we apply

this idea for constructing all acceptable cases for each 119878119894in

Table 2Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case

has the following properties

(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path

(2) otherwise there can be at most two root 2-overlappaths

323 Path-Counting Formula forΦ119886119887119888119889

Now we present thepath-counting formula forΦ

119886119887119888119889as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

119871quad

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad+1

Φ119860119860119860

+ sum

Type 3(1

2)

119871quad+2

Φ119860119860)

(14)

where Φ119860119860= (12)(1+119865

119860)Φ119860119860119860

= (14)(1+3119865119860)Φ119860119860119860119860

=

(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-

common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904

ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path119875119860119905

ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap

path 119875119860119905

ending at 119904 and 119905respectively

119871quad =

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119904

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860119905for Case 2 isin Type 3

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus119871119875119860119905minus 119871119875119860119904

for Case 3 isin Type 3(15)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119887

119875119860119888 119875119860119889 etc)

For completeness the path-counting formulas for Φ119886119886119887119888

and Φ119886119886119886119887

are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C

Computational and Mathematical Methods in Medicine 11

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

s t

da

A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d

(a)

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

x y

da

A rarr x rarr a

A rarr x rarr d

A rarr y rarr bA rarr y rarr c

(b)

Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889

33 Path-Counting Formulas for Two Pairs of Individuals

331 Terminology and Definitions

(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875

119878119886isin 119875(119878 119886) 119875

119878119887isin

119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875

119879119889isin 119875(119879 119889) 119878 is a common ancestor

of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889

(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875

119860119886 119875119860119887) (or

119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when

119875119860119886

and 119875119860119887

(or 119875119860119888

and 119875119860119889) pass through the same parent of

119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call

119903 a heter-overlap individual when 119875119860119894

and 119875119860119895

pass throughthe same parent of 119903

(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap

individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path

Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875

119860119886

and 119875119860119887

119905 is a homo-overlap individual between 119875119860119888

and 119875119860119889 And

119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875

119860119886and 119875

119860119889 119910 is a heter-

overlap individual between 119875119860119887

and 119875119860119888 And 119860 rarr 119909 and

119860 rarr 119910 are root heter-overlap paths

332 Path-Counting Formula for Φ119886119887119888119889

Now we presenta path-pair level graphical representation for ⟨(119875

119860119886 119875119860119887)

(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can

be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩

presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are

summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875

119860ℎ(ie the path 119875

119860ℎending at ℎ) and 119903

1

and 1199032are the last individuals of root heter-overlap paths 119875

1198601199031

and 1198751198601199032

respectivelyGiven a pedigree graph having one or multiple progeni-

tors 119901119894| 119894 gt 0 we define that the generation of a progenitor

Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

Zero root 2-overlap andzero root 3-overlap

Zero root homo-overlap and zero rootheter-overlap

One root 2-overlap path

One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap

Two root 2-overlap paths

Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps

One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

One root 2-overlap andone root 3-overlap

One root homo-overlap and two rootheter-overlaps and 119903

1= 1199032= ℎ

One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

119901119894is 0 denoted as gen(119901

119894) = 0 If an individual 119886 has only

one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1

The path-counting formula forΦ119886119887119888119889

is as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

1198712-pair

Φ119860119860119860

+ sum

Type 2(1

2)

1198712-pair+1

Φ119860119860119860

+ sum

Type 3(1

2)

1198712-pair+2

Φ119860119860

+ sum

Type 4(1

2)

1198712-pair+1

Φ119860119860)

+ sum

(119878119879)isinType 5(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

⟩+1

Φ119861119861

(16)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =

119860) there are four types (ieType 1 to Type 4)

12 Computational and Mathematical Methods in Medicine

S0S1 S2 S3 S4 S5 S6 S7

S8 S9 S10 S11 S12 S13 S14 S15 S16

PAa

PAdPAb

PAc

Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level

Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875

119860119903ending at 119903

Type 3

zero root homo-overlap and two rootheter-overlap 119875

1198601199031and1198751198601199032

ending at1199031and 1199032 respectively

one root homo-overlap 119875119860ℎ

ending at ℎand two root heter-overlap 119875

1198601199031and 119875

1198601199032

ending at 1199031and 1199032 and 119903

1= 1199032

(17)

Type 4 one root homo-overlap 119875119860ℎ

ending at ℎ andtwo root heter-overlap ending at 119903

1and 1199032 and ℎ =

1199031= 1199032 For ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is

one type (ie Type 5)Type 5 ⟨119875

119878119886 119875119878119887⟩ has zero overlap individual ⟨119875

119879119888

119875119879119889⟩ has zero overlap individual

At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875

119879119888

119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875

119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩

there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879

119861=

119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)

and 119879 has two parents119879 otherwise

1198712-pair =

119871119875119860119886+ 119871119875119860119887

+119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119903

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 1198711198751198601199031

minus 1198711198751198601199032

for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860ℎfor Type 4

119871⟨119875119878119886 119875119878119887⟩

= 119871119875119878119886+ 119871119875119878119887

for Type 5

119871⟨119875119879119888 119875119879119889⟩

= 119871119875119879119888+ 119871119875119879119889

for Type 5

(18)

Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ

119886119887119888119889

Φ119886119887119888119889

= sum

(119878119879)isinType 6(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

Φ119878119878lowast Φ119879119879 (19)

Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875

119879119888

119875119879119889⟩ is a nonoverlapping path-pair Between a path from

⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩ there are no overlap

individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩

and 119871⟨119875119879119888119875119879119889⟩

are defined as in Type 5The correctness of the path-counting formula forΦ

119886119887119888119889is

proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ

119886119886119887119888 Φ119886119887119886119888

Φ119886119887119886119887

andΦ119886119886119886119887

34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method

Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows

(1) If 119906 is 119903 then NC(119903) contains only one element theempty string

(2) Otherwise let 119906 be a node with NC(119906) and V0 V1

V119896be 119906rsquos children in sibling order then for each 119909

in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le

119894 le 119896 and lowast indicates the gender of the individualrepresented by node V

119894

Computational and Mathematical Methods in Medicine 13

Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33

In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group

We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement

In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients

In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree

Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130

4 Conclusion

We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients

0

50

100

150

200

77 181

383

769

1558

3105

6174

1235

1

2466

7

4976

1

9832

8

1951

97

250

300

Aver

age t

ime (

ms)

Individuals in pedigree

RecursiveNodecodes

Figure 14 The effect of pedigree size on computation efficiencyimprovement

0200400600800

10001200140016001800

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Aver

age t

ime (

ms)

Depth

RecursiveNodeCodes

Figure 15 The effect of depth on computation efficiency improve-ment

we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees

Appendices

A Path-Counting Formulas of Special Cases

A1 Path-Counting Formula for Φ119886119886119887

For ⟨1198751198601198861 1198751198601198862⟩ we

introduce a special case where 1198751198601198861

and 1198751198601198862

aremergeable

14 Computational and Mathematical Methods in Medicine

PAa1 PAa2 PAa1 PAa2

S0 S1

PAb PAb PAb

If is mergeable⟨P ⟩Aa1 PAa2

PAa

S2 S3

Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩

Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861

1198751198601198862⟩ is mergeable if and only if the two paths 119875

1198601198861and 119875

1198601198862

are completely identical

Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩ in Figure 16

Lemma A2 For 1198782and 119878

3in Figure 16 ⟨119875

1198601198861 1198751198601198862⟩ cannot

be a mergeable path-pair

Proof For 1198782and 119878

3 if ⟨119875

1198601198861 1198751198601198862⟩ is mergeable then

any common individual 119904 between 1198751198601198861

and 119875119860119887

is alsoa shared individual between 119875

1198601198862and 119875

119860119887 It means

119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that

119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0

Considering all three scenarios in Figure 16 only 1198781can

have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now

we present our path-counting formula forΦ119886119886119887

where 119886 is notan ancestor of 119887

Φ119886119886119887

= sum

119860

( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860

+ sum

Type 3(1

2)

119871⟨119875119860119886119875119860119887⟩+1

Φ119860119860)

(A1)

where 119860 a common ancestor of 119886 and 119887When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap

Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at the individual 119904

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair

119871 triple = 1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

for Type 3

(A2)

For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ

119886119886119887in [10] but we can use either

the recursive formula for Φ119886119887119888

or the path-counting formulaforΦ119886119887119888

to computeΦ11988611198862119887

A2 Path-Counting Formula for Φ119886119886119887119888

Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable then

we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888

119875119860119889⟩ If ⟨119875

1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875

1198601198861 1198751198601198862

119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875

119860119886 119875119860119887 119875119860119888⟩

Now we present a path-counting formula forΦ119886119886119887119888

where119886 is not an ancestor of 119887 and 119888 as follows

Φ119886119886119887119888

= sum

119860

( sum

Type 1(1

2)

119871quadminus1

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad

ΦAAA

+ sum

Type 3(1

2)

119871quad+1

Φ119860119860)

+sum

119860

( sum

Type 4(1

2)

119871 triple+1

Φ119860119860119860

+ sum

Type 5(1

2)

119871 triple+2

Φ119860119860)

(A3)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875

119860119904ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

and 1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path 119875119860119905

ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904

and 119875119860119905

ending at 119904 and 119905respectively

(A4)

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path

Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875

119860119904

ending at 119904

119871quad=

1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119904

for Type 21198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905

for Case 2isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905minus 119871119875119860119904

for Case 3isinType 3

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 5(A5)

Computational and Mathematical Methods in Medicine 15

Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ

119886119887119888119889is applicable

to computeΦ11988611198862119887119888

A3 Path-Counting Formula for Φ119886119886119886119887

A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced

when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of

a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-

densed to ⟨119875119860119886 119875119860119887⟩

Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861

1198751198601198862

and 1198751198601198863

they are mergeable if and only if theyare completely identical

Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there

must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩

⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩

Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one

mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and

⟨1198751198601198862 1198751198601198863⟩

For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable

path-pairNow we present the path-counting formula for Φ

119886119886119886119887

where 119886 is not an ancestor of 119887 as follows

Φ119886119886119886119887

= sum

119860

(3

2( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860)

+ sum

Type 3(1

2)

119871pair+2

Φ119860119860)

(A6)

where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-

sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)

Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path

Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at 119904

When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping

119871 triple = 1198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3

(A7)

Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887

=

Φ119886111988621198863119887

Then we apply the path-counting formula forΦ119886119887119888119889

to computeΦ119886111988621198863119887

Case21 Case31 ΦAAAΦabCase22 Case32

Case23 ΦAA

Figure 17 Dependency graph for different cases regardingΦ119886119887119888

andΦ119886119886119887

B Proof for Path-Counting Formulas ofThree Individuals

Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ

119886119887119888is equivalent to the

computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors

B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ

119886119887119888andΦ

119886119886119887119866 can

have 5 different cases

Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 23 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

having mergeablepath-pair⟨119875

1198601198861 1198751198601198862⟩

lArr997904 Φ119886119886119887

Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlapCase 32 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlap

lArr997904 Φ119886119887119888

(B1)

Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals

Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)

(i) forΦ119886119887 the correctness of the path-counting formula

(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32

(ii) for Case 23 it has no cycle but only depends on Φ119886119887

Thus we prove the correctness of Case 23 by trans-forming the case toΦ

119886119887

16 Computational and Mathematical Methods in Medicine

a b

c

(a)

A

a b c

(b)

Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another

Parent-child relationshipAncestor-descendant relationship

A

a

s v p

f b c

(a)

Parent-child relationshipAncestor-descendant relationship

c

a

s v

f b

(b)

Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887

(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866

B11 Correctness Proof for Case 31

Case 31 ForΦ119886119887119888

119866 does not have any path triples ⟨119875119860119886 119875119860119887

119875119860119888⟩ with root overlap

Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888

Using the recursive formula (3) to compute Φ119886119887119888

forFigure 18(a) Φ

119886119887119888= (12)Φ

119888119887119888= (12)

2

Φ119888119888119888 for Figure 18(b)

Φ119886119887119888= (12)Φ

119860119887119888= (12)

2

Φ119860119860119888

= (12)3

Φ119860119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the

contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

follows sumType 1(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

where 119871⟨119875119860119886119875119860119887 119875119860119888⟩

=

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

For Figure 18(a) 119888 is the only triple-common ancestor

and we obtain Φ119886119887119888

= (12)119871⟨119875119888119886119875119888119887

119875119888119888⟩Φ119888119888119888

= (12)2

Φ119888119888119888 for

Figure 18(b) we obtain Φ119886119887119888

= (12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

=

(12)3

Φ119860119860119860

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1

For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866

then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888

119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ

119891119887119888= sumType 1(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860

Based on the recursive formula (3)Φ

119886119887119888= (12)(Φ

119891119887119888+Φ119898119887119888)

where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ

119898119887119888= 0 Then we can plug-in the path-

counting formula forΦ119891119887119888

to obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2lowast sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩

Φ119860119860119860

= sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860119860

∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 1(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩

Φ119860119860119860

(B2)

Similarly for Figure 19(b) we obtain Φ119886119887119888

=

sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1

Φ119888119888119888= sumType 1(12)

119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888

Thus it is true for 119899 = 119896 + 1

B12 Correctness Proof for Case 32

Case 32 ForΦ119886119887119888

119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root

overlap

Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888

Computational and Mathematical Methods in Medicine 17

a

b

c

(a)

A

a

b c

(b)

A

a

s

b

c

(c)

Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another

Using the recursive formula (3) to compute Φ119886119887119888

inFigure 20 for Figure 20(a) Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)3

Φ119888119888 for Figure 20(b)Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)4

Φ119860119860

for Figure 20(c)Φ119886119887119888= (12)

2

Φ119904119904119888= (12)

3

Φ119904119888=

(12)5

Φ119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-

tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

followssumType 2(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩+1

Φ119860119860

where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

=

119871119875119860119886

+ 119871119875119860119887

+ 119871119875119860119888minus 119871119875119860119904

and 119904 is the last individual of theroot overlap path 119875

119860119904

For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ

119886119887119888= (12)

119871⟨119875119888119886119875119888119887119875119888119888⟩+1

Φ119888119888= (12)

2+1

Φ119888119888=

(12)3

Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain

Φ119886119887119888= (12)

4

Φ119860119860

and Φ119886119887119888= (12)

5

Φ119860119860

respectively

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1

For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ

119891119887119888in 119866lowast Φ

119891119887119888= sumType 2(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

In 119866 119891 is the only parent of 119886 according to the recursive

formula (3) we have Φ119886119887119888= (12)Φ

119891119887119888 Then we can plug-in

the Φ119891119887119888

and obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩+1

Φ119860119860

(B3)

For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ

119886119887119888for Figure 21(a)

In summary it is true for 119899 = 119896 + 1

A

a

s

t

f

b

c

(a)

a

t

b

A

s c

(b)

a

s

t

b

c

(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887

B13 Correctness Proof for Case 23

Case 23 For Φ119886119886119887

the path-triples in the pedigree graph 119866have mergeable path-pair

Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875

1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means

that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ

119886119886119887can be computed as follows

sumType 3(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

where 119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

Using the recursive formula (4) we obtain Φ

119886119886119887=

(12)(Φ119886119887+ Φ119891119898119887)

For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891

there4 Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

=1

2(Φ119886119887+ 0) =

1

2Φ119886119887

(as 119898 is missing) (B4)

For Φ119886119887 we use Wrightrsquos formula and obtain Φ

119886119887=

sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860

where 119875 denotes all nonoverlappingpath-pairs ⟨119875

119860119886 119875119860119887⟩

Then we have Φ119886119886119887

= (12)Φ119886119887

=

(12)sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

For Figure 22(b) we can also transform the computation

of Φ119886119886119887

to Φ119886119887

In summary it shows that the path-counting formula(A1) is true for Case 23

B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887

whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ

119886119886119887can be

transformed toΦ11988611198862119887

which is equivalent to the computationof Φ119886119887119888

for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ

119886119886119887when the path-triple belongs

to either Case 21 or Case 22

B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 5: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

Computational and Mathematical Methods in Medicine 5

A

c

s

dt

f

ba

m

Path-quad1

Path-quad2

Path-quad3

Path-quad4

A rarr cA rarr d

A rarr t rarr f rarr s rarr aA rarr m rarr s rarr b

A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr cA rarr d

A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr t rarr f rarr s rarr cA rarr d

A rarr t rarr f rarr s rarr aA rarr t rarr m rarr s rarr bA rarr t rarr m rarr s rarr cA rarr d

Figure 3 Examples of path-quads

Table 1 The conceptual terms used for two three and four individuals

Two individuals Three individuals Four individualsCommon ancestor Triple-common ancestor Quad-common ancestorPath-pair Path-triple Path-quad119861119894 119862(119875

119860119886 119875119860119887) 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) 119876119906119886119889 119862(119875

119860119886 119875119860119887 119875119860119888 119875119860119889)

NA 2-Overlap individual 3-Overlap individualNA 2-Overlap path 3-Overlap pathNA Root 2-overlap path Root 3-overlap pathNA Crossover individual Crossover individual

the two paths share no common individuals except 119860 InFigure 2 path-pair2 is an acceptable path-pair while path-pair1 path-pair3 and path-pair4 are not acceptable path-pairs The contribution of each common ancestor 119860 toΦ

119886119887is

computed based on the inbreeding coefficient of 119860 modifiedby the length of each acceptable path-pair

To compute Φ119886119887119888

the path-counting approach requiresidentifying all triple-common ancestors of 119886 119887 and 119888 andsumming up all triple-common ancestorsrsquo contributions toΦ119886119887119888

For each triple-common ancestor denoted as119860 we firstidentify all path-triples each of which consists of three pathsfrom 119860 to 119886 119887 and 119888 respectively Some examples of path-triples are presented in Figure 2

For Φ119886119887 only nonoverlapping path-pairs are acceptable

A path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three path-pairs

⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩ For Φ

119886119887119888 a path-triple

might be acceptable even though either 2-overlap individualsor crossover individuals exist between a path-pair Themain challenge we need to address is finding necessary andsufficient conditions for acceptable path-triples

Aiming at solving the problem of identifying acceptablepath-triples we first use a systematic method to generate allpossible cases for a path-pair by considering different types ofcommon individuals shared between the two pathsThen weintroduce building blocks which are connected graphs withconditions on every edge in the graph that encapsulates a

set of acceptable cases of path-pairs In each building blockwe represent paths as nodes and interactions (ie sharedcommon individuals between two paths) as edges There areat least two paths in a building block For each buildingblock we obtain all acceptable cases for concerned path-pairs Given a path-triple it can be decomposed to one ormultiple building blocks Considering a shared path-pairbetween two building blocks we use the natural join operatorfrom relational algebra to match the acceptable cases forthe shared path-pair between two building blocks In otherwords considering the acceptable cases for building blocksas inputs we use the natural join operator to construct allacceptable cases for a path-triple Acceptable cases for a path-triple are identified and then used in deriving the path-counting formula forΦ

119886119887119888

Then we summarize all the main procedures used forderiving the path-counting formula for Φ

119886119887119888in a flowchart

shown in Figure 4 The main procedures are also applicablefor deriving the path-counting formulas forΦ

119886119887119888119889andΦ

119886119887119888119889

3 Results and Discussion

31 Path-Counting Formulas for Three Individuals We firstintroduce a systematic method to generate all possible cases

6 Computational and Mathematical Methods in Medicine

Path-pair

Path-triple Path-pair levelrepresentation Decomposition A set of

building blocksSets of acceptable casesFor each building block

Acceptable cases forpath-triple Natural join

If path-pair hascrossover

No

No

Yes

Yes

Split operator

Path-triple belongs toType 2

Type 1

If path-pair hasroot overlap

Compute its contributionto Φabc

Path-triple belongs to

⟨PAa PAb⟩Generate all cases for Identify nonoverlap path-

Pairs for ⟨PAa PAb⟩Compute its contribution

to Φab

Identify acceptable cases⟨PAa PAb⟩ in thefor

context of a path-triple

Aa PAb PAc ⟩⟨P

⟨PAa PAb⟩

Figure 4 A flowchart for path-counting formula derivation

for a path-pair Then we discuss building blocks for path-triples and identify all acceptable cases which are used inderiving the path-counting formula forΦ

119886119887119888

311 Cases for a Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩with

119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 where 119860 is a common ancestor of 119886

and 119887 and 119861119894 119862(119875119860119886 119875119860119887) consists of all common individuals

shared between 119875119860119886

and 119875119860119887 except 119860 we introduce three

patterns (ie crossover 2-overlap and root 2-overlap) to gen-erate all possible cases for ⟨119875

119860119886 119875119860119887⟩

(1) 119883(119875119860119886 119875119860119887) 119875119860119886

and 119875119860119887

share one or multiple cross-over individuals

(2) 119879(119875119860119886 119875119860119887) 119875119860119886

and 119875119860119887

are root 2-overlapping from119860 and the root 2-overlap path can have one or multi-ple 2-overlap individuals

(3) 119884(119875119860119886 119875119860119887)119875119860119886

and119875119860119887

are overlapping but not from119860 and the 2-overlap path can have one or multiple 2-overlap individuals

Based on the three patterns 119883(119875119860119886 119875119860119887) 119879(119875

119860119886 119875119860119887)

and 119884(119875119860119886 119875119860119887) we use regular expressions to generate all

possible cases for the path-pair ⟨119875119860119886 119875119860119887⟩ For convenience

we drop ⟨119875119860119886 119875119860119887⟩ and use 119883119879 and 119884 instead of patterns

119883(119875119860119886 119875119860119887) 119879(119875

119860119886 119875119860119887) and 119884(119875

119860119886 119875119860119887) whenever there is

no confusion When 119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 the eight cases

shown in (7) cover all possible cases for ⟨119875119860119886 119875119860119887⟩ The com-

pleteness of eight cases shown in (7) for ⟨119875119860119886 119875119860119887⟩ can be

proved by induction on the total number of 119879 119883 and 119884appearing in ⟨119875

119860119886 119875119860119887⟩ Using the pedigree in Figure 2 Cases

1ndash3 and Case 6 are illustrated in (8) (9) (10) and (11)

Case 1 119879Case 2 119883+

Case 3 119879119883+

Case 4 119879(119883+119884)+

Case 5 119879(119883+119884)+119883+

Case 6 119883+119884Case 7 119883+(119884119883+)+

Case 8 119883+(119884119883+)+119884

(7)

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119879 (8)

Computational and Mathematical Methods in Medicine 7

S0 S1 S2 S3

PAa PAb

PAc

Figure 5 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888⟩

where 119904 119890 119905 are 2-overlap individuals and the overlap pathis a root 2-overlap path

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119879119883 (9)

where 119904 is a 2-overlap individual and the overlap path is a root2-overlap path 119905 is a crossover individual

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119889 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119883 (10)

where 119905 is a crossover individual

119860 997888rarr 119888 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119883119884 (11)

where 119890 is a crossover individual 119905 is a 2-overlap individualand the overlap path is a 2-overlap path

312 Path-Pair Level Graphical Representation of a Path-Tri-ple Given a path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ we represent each

path as a node The path-triple can be decomposed to threepath-pairs (ie ⟨119875

119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩) For

each path-pair if the two paths share at least one commonindividual (ie either 2-overlap individual or crossover indi-vidual) except119860 then there is an edge between the two nodesrepresenting the two paths Therefore we obtain four differ-ent scenarios 119878

0ndash1198783 shown in Figure 5

In Figure 5 the scenario 1198780has no edges so it means

that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent paths In

Figure 2 path-triple1 is an example of 1198780 Next we introduce

a lemma which can assist with identifying the options for theedges in the scenarios 119878

1ndash1198783

Lemma 3 Given a path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consider the

three path-pairs ⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩ if there

is a 2-overlap edge which is represented by 119884 in regular expres-sion representation of any of the three path-pairs and then thepath-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ has no contribution to Φ

119886119887119888

Proof In [17] Nadot and Vaysseix proposed from a geneticand biological point of view that Φ

119886119887119888can be evaluated by

enumerating all eligible inheritance paths at allele-level start-ing from a triple common ancestor119860 to the three individuals119886 119887 and 119888

p1

p3

A

b c

a

p2

p5

p8

p4

p7

p6

(a) Pedigree

A

b c

a

p5

p7

p4

p6

p8

p1 p2

p3

(b) Inheritance paths

Figure 6 Examples of pedigree and inheritance paths

For the pedigree in Figure 6 let us consider the path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ listed as follows 119875

119860119886 119860 rarr 119886 119875

119860119887

119860 rarr 1199013rarr 1199016rarr 1199017rarr 119887 119875

119860119888 119860 rarr 119901

4rarr 1199016rarr

1199017rarr 119888For ⟨119875

119860119887 119875119860119888⟩ 1199016is a crossover individual 119901

7is an over-

lap individual and 1199016rarr 1199017is a 2-overlap edge repre-sented

by 119884 in regular expression representation (see the definitionfor 119884 in Section 311)

For the individual 1199016 let us denote the two alleles at one

fixed autosomal locus as 1198921and 119892

2 At allele-level only one

allele can be passed down from 1199016to 1199017 Since 119901

3and 119901

4

are parents of 1199016 1198921is passed down from one parent and

1198922is passed down from the other parent It is infeasible to

pass down both 1198921and 119892

2from 119901

6to 1199017 In other words

there are no corresponding inheritance paths for the path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩with a 2-overlap edge between ⟨119875

119860119887 119875119860119888⟩

(ie Case 6119883119884) Therefore such kind of path-triples has nocontribution toΦ

119886119887119888

Figure 6(b) shows one example of eligible inheritancepaths corresponding to a pedigree graph Each individual isrepresented by two allele nodesThe eligible inheritance pathsin Figure 6(b) consist of red edges only

Only Case 1 Case 2 and Case 3 do not have 119884 in theregular expression representation of a path-pair (see (7))considering the scenarios 119878

1ndash1198783shown in Figure 5 an edge

can have three options Case 1 119879Case 2 119883Case 3 119879119883

313 Constructing Cases for a Path-Triple For the scenarios1198781ndash1198783in Figure 5 we define two building blocks 119861

1 1198612

along with some rules in Figure 7 to generate acceptablecases For 119861

1 the edge can have three options Case 1 119879

Case 2 119883 Case 3 119879119883 For 1198612 we cannot allow both edges

to be root overlap because if two edges are root overlap then

8 Computational and Mathematical Methods in Medicine

For B2 there can be at most one edge belonging to root overlap (either T or TX)

PAa PAa

PAb PAb PAc

B1 B2

For B1 the edge can have three options case 1 T case 2 X case 3 TX

Figure 7 Building blocks 1198611 1198612 and basic rules

Note Ri denotes all acceptable path-triples for ui

S3e1

T3 = R1 ⋈ R2 ⋈ R3u1 u2 u3

e2 e2 e2

e3e3 e3e1 e1

Figure 8 A graphical illustration for obtaining 1198793

119875119860119886

and 119875119860119888

must share at least one com-mon individualexcept 119860 which contradicts the fact that 119875

119860119886and 119875

119860119888have

no edgeNext we focus on generating all acceptable cases for the

scenarios 1198781ndash1198783in Figure 5 where only 119878

3contains more

than one building block In order to leverage the dependencyamong building blocks we decompose 119878

3to 1198783= 1199061= 1198612

1199062= 1198612 1199063= 1198612 shown in Figure 8 For each 119906

119894 we have a

set of acceptable path-triples denoted as 119877119894

Considering the dependency among 1198771 1198772 1198773 we use

the natural join operator denoted as ⋈ operating on 1198771

1198772 1198773 to generate all acceptable cases for 119878

3 As a result we

obtain 1198793= 1198771⋈ 1198772⋈ 1198773 where 119879

3denotes the acceptable

cases of the path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in the scenario 119878

3

For each scenario in Figure 5 we generate all acceptablecases for ⟨119875

119860119886 119875119860119887 119875119860119888⟩ The scenario 119878

0has no edges and

it shows that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent

paths while for the other scenarios 119878119896(119896 = 1 2 3) the 119896

edges can have two options

(1) all 119896 edges belong to crossover or(2) one edge belongs to root 2-overlap the remaining (119896minus

1) edges belong to crossover

In summary acceptable path-triples can have at most oneroot 2-overlap path any number of crossover individuals butzero 2-overlap path

314 Splitting Operator Considering the existence of root2-overlap path and crossover in acceptable path-triples wepropose a splitting operator to transform a path-triple withcrossover individuals to a noncrossover path-triple withoutchanging the contribution from this path-triple to Φ

119886119887119888 The

main purpose of using the splitting operator is to simplifythe path-counting formula derivation process We first usean example in Figure 9 to illustrate how the splitting operator

works In Figure 9 there is a crossover individual 119904 between119875119860119886

and 119875119860119887

in the path triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in 119866

119896+1 The

splitting operator proceeds as follows

(1) split the node 119904 to two nodes 1199041and 1199042

(2) transform the edges 119904 rarr 1198861015840 and 119904 rarr 119887

1015840 to 1199041rarr 1198861015840

and 1199042rarr 1198871015840 respectively

(3) add two new edges 1199042rarr 1198861015840 and 119904

1rarr 1198871015840

Lemma 4 Given a pedigree graph 119866119896+1

having (119896 + 1)

crossover individuals regarding ⟨119875119860119886 119875119860119887 119875119860119888⟩ shown in

Figure 9 let 119904 denote the lowest crossover individual where nodescendant of 119904 can be a crossover individual among the threepaths119875

119860119886119875119860119887 and119875

119860119888 After using the splitting operator for the

lowest crossover individual 119904 in119866119896+1 the number of crossover

individuals in 119866119896+1

is decreased by 1

Proof The splitting operator only affects the edges from 119904 to1198861015840 and 1198871015840 If there is a new crossover node appearing the only

possible node is either 1198861015840 or 1198871015840 Assume 1198871015840 becomes a cross-over individual it means that 1198871015840 is able to reach 119886 and 119887 fromtwo separate paths It contradicts the fact that 119904 is the lowestcrossover individual between 119875

119860119886and 119875

119860119887

Next we introduce a canonical graph which results fromapplying the splitting operator for all crossover individualsThe canonical graph has zero crossover individual

Definition 5 (Canonical Graph) Given a pedigree graph 119866having one or more crossover individuals regarding Φ

119886119887119888 If

there exists a graph 1198661015840 which has no crossover individualswith regards to Φ

119886119887119888such that

(i) any acceptable path-triple in 119866 has an acceptablepath-triple in 1198661015840 which has the same contribution toΦ119886119887119888

as the one in 119866 forΦ119886119887119888

(ii) any acceptable path-triple in 1198661015840 has an acceptablepath-triple in 119866 which and has the same contributionto Φ119886119887119888

as the one in 1198661015840 forΦ119886119887119888

We call 1198661015840 a canonical graph of 119866 regardingΦ119886119887119888

Lemma 6 For a pedigree graph 119866 having one or morecrossover individuals regarding ⟨119875

119860119886 119875119860119887 119875119860119888⟩ there exists a

canonical graph 1198661015840 for 119866

Computational and Mathematical Methods in Medicine 9

Ancestor-descendant relationshipParent-child relationship

a998400 b

a b a b

998400 a998400 b998400

s1 s2

A A

x w c x w c

s For Gk+1 ⟨P ⟩ = PAa PAb PAc

⟨P ⟩ = PAa PAb PAcFor Gk

Gk+1 k + 1 crossover Gk k crossover

A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b

A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b

A rarr c

A rarr c

Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866

119896having 119896 crossover

S0

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

PAa PAd

PAb PAc

Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Proof (Sketch) The proof is by induction on the number ofcrossover individuals

Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866

In the induction step let119866119896+1

be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875

119860119886and

119875119860119887

in 119866119896+1

We apply the splitting operator on 119904 in 119866119896+1

andobtain 119866

119896having 119896 crossovers by Lemma 4

315 Path-Counting Formula for Φ119886119887119888

Now we present thepath-counting formula forΦ

119886119887119888

Φ119886119887119888= sum

119860

( sum

Type 1(1

2)

119871 triple

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple+1

Φ119860119860)

(12)

where Φ119860119860= (12)(1 + 119865

119860) Φ119860119860119860

= (14)(1 + 3119865119860) 119865119860 the

inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875

119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type

2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875

119860119904ending at

the individual 119904

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 2(13)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119886

119875119860119888 and 119875

119860119904)

For completeness the path-counting formula for Φ119886119886119887

isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B

32 Path-Counting Formulas for Four Individuals

321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and

119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11

scenarios 1198780ndash11987810shown in Figure 10 where all four paths are

considered symmetricallyIn Figure 11 we introduce three building blocks 119861

1

1198612 1198613 For 119861

1and 119861

2 the rules presented in Figure 7 are also

applicable for Figure 11 For1198613 we only consider root overlap

because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the

scenario 1198783in Figure 8 Therefore we only need to consider

1198613when 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0

322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ For a scenario 119878

119894(0 le 119894 le 10) in Figure 11 we

first decompose 119878119894to one or multiple building blocks For a

scenario 119878119894isin 1198781 1198783 it has only one building block and

all acceptable cases can be obtained directly For 1198782= 1199061=

1198611 1199062= 1198611 there is no need to consider the conflict between

the edges in 1199061and 119906

2because 119906

1and 119906

2are disconnected

Let 119877119894denote all acceptable cases of the path-pairs in 119906

119894 and

let 119879119894denote all acceptable cases for 119878

119894 Therefore we obtain

1198792= 1198771times1198772where times denotes the Cartesian product operator

from relational algebra

10 Computational and Mathematical Methods in Medicine

For B3 all three edges belong to root overlap (ie having root 3-overlap)

PAa

PAb PAcPAb

PAa

C(PAa PAb PAc) ne

B1 B2 B3

Tri 0

Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)

119878119894

1198784

1198785

1198787

1198788

1198789

11987810

119878119895

1198783

1198783

1198786

1198785

1198787

1198789

For 1198786= 1199061= 1198613 we obtain 119879

6= 1198771 For 119878

119894isin 119878119894| 4 le

119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based

on which we construct 119879119894

Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le

10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878

119895 is

defined as follows

(1) 119878119895is a proper subgraph of 119878

119894

(2) if 119878119894contains 119861

3 then 119878

119895must also contain 119861

3

(3) no such 119878119896exists that 119878

119895is a proper subgraph of 119878

119896

while 119878119896is also a proper subgraph of 119878

119894

For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the

largest subgraph of 119878119894 denoted as 119878

119895 in Table 2

For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878

119894 119878119895)

denote the set of building blocks in 119878119894but not in 119878

119895 where 119878

119895is

the largest subgraph of 119878119894 Let |119864

119894| and |119864

119895| denote the number

of edges in 119878119894and 119878

119895 respectively According to Table 2 we

can conclude that |119864119894| minus |119864

119895| = 1 In order to leverage the

dependency among building blocks we consider only 1198612in

Diff(119878119894119878119895) For example Diff(119878

51198783) = 119861

2 Let119879

3denote all

acceptable cases for 1198783 And let119877

1denote the set of acceptable

cases for Diff(1198785 1198783) Then we can use 119878

3and Diff(119878

5

1198783) to construct all acceptable cases for 119878

5 Then we apply

this idea for constructing all acceptable cases for each 119878119894in

Table 2Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case

has the following properties

(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path

(2) otherwise there can be at most two root 2-overlappaths

323 Path-Counting Formula forΦ119886119887119888119889

Now we present thepath-counting formula forΦ

119886119887119888119889as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

119871quad

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad+1

Φ119860119860119860

+ sum

Type 3(1

2)

119871quad+2

Φ119860119860)

(14)

where Φ119860119860= (12)(1+119865

119860)Φ119860119860119860

= (14)(1+3119865119860)Φ119860119860119860119860

=

(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-

common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904

ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path119875119860119905

ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap

path 119875119860119905

ending at 119904 and 119905respectively

119871quad =

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119904

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860119905for Case 2 isin Type 3

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus119871119875119860119905minus 119871119875119860119904

for Case 3 isin Type 3(15)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119887

119875119860119888 119875119860119889 etc)

For completeness the path-counting formulas for Φ119886119886119887119888

and Φ119886119886119886119887

are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C

Computational and Mathematical Methods in Medicine 11

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

s t

da

A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d

(a)

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

x y

da

A rarr x rarr a

A rarr x rarr d

A rarr y rarr bA rarr y rarr c

(b)

Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889

33 Path-Counting Formulas for Two Pairs of Individuals

331 Terminology and Definitions

(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875

119878119886isin 119875(119878 119886) 119875

119878119887isin

119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875

119879119889isin 119875(119879 119889) 119878 is a common ancestor

of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889

(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875

119860119886 119875119860119887) (or

119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when

119875119860119886

and 119875119860119887

(or 119875119860119888

and 119875119860119889) pass through the same parent of

119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call

119903 a heter-overlap individual when 119875119860119894

and 119875119860119895

pass throughthe same parent of 119903

(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap

individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path

Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875

119860119886

and 119875119860119887

119905 is a homo-overlap individual between 119875119860119888

and 119875119860119889 And

119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875

119860119886and 119875

119860119889 119910 is a heter-

overlap individual between 119875119860119887

and 119875119860119888 And 119860 rarr 119909 and

119860 rarr 119910 are root heter-overlap paths

332 Path-Counting Formula for Φ119886119887119888119889

Now we presenta path-pair level graphical representation for ⟨(119875

119860119886 119875119860119887)

(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can

be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩

presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are

summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875

119860ℎ(ie the path 119875

119860ℎending at ℎ) and 119903

1

and 1199032are the last individuals of root heter-overlap paths 119875

1198601199031

and 1198751198601199032

respectivelyGiven a pedigree graph having one or multiple progeni-

tors 119901119894| 119894 gt 0 we define that the generation of a progenitor

Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

Zero root 2-overlap andzero root 3-overlap

Zero root homo-overlap and zero rootheter-overlap

One root 2-overlap path

One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap

Two root 2-overlap paths

Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps

One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

One root 2-overlap andone root 3-overlap

One root homo-overlap and two rootheter-overlaps and 119903

1= 1199032= ℎ

One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

119901119894is 0 denoted as gen(119901

119894) = 0 If an individual 119886 has only

one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1

The path-counting formula forΦ119886119887119888119889

is as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

1198712-pair

Φ119860119860119860

+ sum

Type 2(1

2)

1198712-pair+1

Φ119860119860119860

+ sum

Type 3(1

2)

1198712-pair+2

Φ119860119860

+ sum

Type 4(1

2)

1198712-pair+1

Φ119860119860)

+ sum

(119878119879)isinType 5(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

⟩+1

Φ119861119861

(16)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =

119860) there are four types (ieType 1 to Type 4)

12 Computational and Mathematical Methods in Medicine

S0S1 S2 S3 S4 S5 S6 S7

S8 S9 S10 S11 S12 S13 S14 S15 S16

PAa

PAdPAb

PAc

Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level

Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875

119860119903ending at 119903

Type 3

zero root homo-overlap and two rootheter-overlap 119875

1198601199031and1198751198601199032

ending at1199031and 1199032 respectively

one root homo-overlap 119875119860ℎ

ending at ℎand two root heter-overlap 119875

1198601199031and 119875

1198601199032

ending at 1199031and 1199032 and 119903

1= 1199032

(17)

Type 4 one root homo-overlap 119875119860ℎ

ending at ℎ andtwo root heter-overlap ending at 119903

1and 1199032 and ℎ =

1199031= 1199032 For ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is

one type (ie Type 5)Type 5 ⟨119875

119878119886 119875119878119887⟩ has zero overlap individual ⟨119875

119879119888

119875119879119889⟩ has zero overlap individual

At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875

119879119888

119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875

119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩

there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879

119861=

119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)

and 119879 has two parents119879 otherwise

1198712-pair =

119871119875119860119886+ 119871119875119860119887

+119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119903

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 1198711198751198601199031

minus 1198711198751198601199032

for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860ℎfor Type 4

119871⟨119875119878119886 119875119878119887⟩

= 119871119875119878119886+ 119871119875119878119887

for Type 5

119871⟨119875119879119888 119875119879119889⟩

= 119871119875119879119888+ 119871119875119879119889

for Type 5

(18)

Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ

119886119887119888119889

Φ119886119887119888119889

= sum

(119878119879)isinType 6(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

Φ119878119878lowast Φ119879119879 (19)

Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875

119879119888

119875119879119889⟩ is a nonoverlapping path-pair Between a path from

⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩ there are no overlap

individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩

and 119871⟨119875119879119888119875119879119889⟩

are defined as in Type 5The correctness of the path-counting formula forΦ

119886119887119888119889is

proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ

119886119886119887119888 Φ119886119887119886119888

Φ119886119887119886119887

andΦ119886119886119886119887

34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method

Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows

(1) If 119906 is 119903 then NC(119903) contains only one element theempty string

(2) Otherwise let 119906 be a node with NC(119906) and V0 V1

V119896be 119906rsquos children in sibling order then for each 119909

in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le

119894 le 119896 and lowast indicates the gender of the individualrepresented by node V

119894

Computational and Mathematical Methods in Medicine 13

Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33

In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group

We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement

In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients

In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree

Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130

4 Conclusion

We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients

0

50

100

150

200

77 181

383

769

1558

3105

6174

1235

1

2466

7

4976

1

9832

8

1951

97

250

300

Aver

age t

ime (

ms)

Individuals in pedigree

RecursiveNodecodes

Figure 14 The effect of pedigree size on computation efficiencyimprovement

0200400600800

10001200140016001800

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Aver

age t

ime (

ms)

Depth

RecursiveNodeCodes

Figure 15 The effect of depth on computation efficiency improve-ment

we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees

Appendices

A Path-Counting Formulas of Special Cases

A1 Path-Counting Formula for Φ119886119886119887

For ⟨1198751198601198861 1198751198601198862⟩ we

introduce a special case where 1198751198601198861

and 1198751198601198862

aremergeable

14 Computational and Mathematical Methods in Medicine

PAa1 PAa2 PAa1 PAa2

S0 S1

PAb PAb PAb

If is mergeable⟨P ⟩Aa1 PAa2

PAa

S2 S3

Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩

Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861

1198751198601198862⟩ is mergeable if and only if the two paths 119875

1198601198861and 119875

1198601198862

are completely identical

Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩ in Figure 16

Lemma A2 For 1198782and 119878

3in Figure 16 ⟨119875

1198601198861 1198751198601198862⟩ cannot

be a mergeable path-pair

Proof For 1198782and 119878

3 if ⟨119875

1198601198861 1198751198601198862⟩ is mergeable then

any common individual 119904 between 1198751198601198861

and 119875119860119887

is alsoa shared individual between 119875

1198601198862and 119875

119860119887 It means

119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that

119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0

Considering all three scenarios in Figure 16 only 1198781can

have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now

we present our path-counting formula forΦ119886119886119887

where 119886 is notan ancestor of 119887

Φ119886119886119887

= sum

119860

( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860

+ sum

Type 3(1

2)

119871⟨119875119860119886119875119860119887⟩+1

Φ119860119860)

(A1)

where 119860 a common ancestor of 119886 and 119887When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap

Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at the individual 119904

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair

119871 triple = 1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

for Type 3

(A2)

For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ

119886119886119887in [10] but we can use either

the recursive formula for Φ119886119887119888

or the path-counting formulaforΦ119886119887119888

to computeΦ11988611198862119887

A2 Path-Counting Formula for Φ119886119886119887119888

Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable then

we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888

119875119860119889⟩ If ⟨119875

1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875

1198601198861 1198751198601198862

119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875

119860119886 119875119860119887 119875119860119888⟩

Now we present a path-counting formula forΦ119886119886119887119888

where119886 is not an ancestor of 119887 and 119888 as follows

Φ119886119886119887119888

= sum

119860

( sum

Type 1(1

2)

119871quadminus1

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad

ΦAAA

+ sum

Type 3(1

2)

119871quad+1

Φ119860119860)

+sum

119860

( sum

Type 4(1

2)

119871 triple+1

Φ119860119860119860

+ sum

Type 5(1

2)

119871 triple+2

Φ119860119860)

(A3)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875

119860119904ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

and 1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path 119875119860119905

ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904

and 119875119860119905

ending at 119904 and 119905respectively

(A4)

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path

Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875

119860119904

ending at 119904

119871quad=

1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119904

for Type 21198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905

for Case 2isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905minus 119871119875119860119904

for Case 3isinType 3

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 5(A5)

Computational and Mathematical Methods in Medicine 15

Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ

119886119887119888119889is applicable

to computeΦ11988611198862119887119888

A3 Path-Counting Formula for Φ119886119886119886119887

A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced

when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of

a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-

densed to ⟨119875119860119886 119875119860119887⟩

Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861

1198751198601198862

and 1198751198601198863

they are mergeable if and only if theyare completely identical

Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there

must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩

⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩

Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one

mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and

⟨1198751198601198862 1198751198601198863⟩

For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable

path-pairNow we present the path-counting formula for Φ

119886119886119886119887

where 119886 is not an ancestor of 119887 as follows

Φ119886119886119886119887

= sum

119860

(3

2( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860)

+ sum

Type 3(1

2)

119871pair+2

Φ119860119860)

(A6)

where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-

sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)

Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path

Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at 119904

When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping

119871 triple = 1198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3

(A7)

Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887

=

Φ119886111988621198863119887

Then we apply the path-counting formula forΦ119886119887119888119889

to computeΦ119886111988621198863119887

Case21 Case31 ΦAAAΦabCase22 Case32

Case23 ΦAA

Figure 17 Dependency graph for different cases regardingΦ119886119887119888

andΦ119886119886119887

B Proof for Path-Counting Formulas ofThree Individuals

Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ

119886119887119888is equivalent to the

computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors

B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ

119886119887119888andΦ

119886119886119887119866 can

have 5 different cases

Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 23 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

having mergeablepath-pair⟨119875

1198601198861 1198751198601198862⟩

lArr997904 Φ119886119886119887

Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlapCase 32 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlap

lArr997904 Φ119886119887119888

(B1)

Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals

Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)

(i) forΦ119886119887 the correctness of the path-counting formula

(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32

(ii) for Case 23 it has no cycle but only depends on Φ119886119887

Thus we prove the correctness of Case 23 by trans-forming the case toΦ

119886119887

16 Computational and Mathematical Methods in Medicine

a b

c

(a)

A

a b c

(b)

Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another

Parent-child relationshipAncestor-descendant relationship

A

a

s v p

f b c

(a)

Parent-child relationshipAncestor-descendant relationship

c

a

s v

f b

(b)

Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887

(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866

B11 Correctness Proof for Case 31

Case 31 ForΦ119886119887119888

119866 does not have any path triples ⟨119875119860119886 119875119860119887

119875119860119888⟩ with root overlap

Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888

Using the recursive formula (3) to compute Φ119886119887119888

forFigure 18(a) Φ

119886119887119888= (12)Φ

119888119887119888= (12)

2

Φ119888119888119888 for Figure 18(b)

Φ119886119887119888= (12)Φ

119860119887119888= (12)

2

Φ119860119860119888

= (12)3

Φ119860119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the

contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

follows sumType 1(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

where 119871⟨119875119860119886119875119860119887 119875119860119888⟩

=

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

For Figure 18(a) 119888 is the only triple-common ancestor

and we obtain Φ119886119887119888

= (12)119871⟨119875119888119886119875119888119887

119875119888119888⟩Φ119888119888119888

= (12)2

Φ119888119888119888 for

Figure 18(b) we obtain Φ119886119887119888

= (12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

=

(12)3

Φ119860119860119860

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1

For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866

then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888

119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ

119891119887119888= sumType 1(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860

Based on the recursive formula (3)Φ

119886119887119888= (12)(Φ

119891119887119888+Φ119898119887119888)

where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ

119898119887119888= 0 Then we can plug-in the path-

counting formula forΦ119891119887119888

to obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2lowast sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩

Φ119860119860119860

= sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860119860

∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 1(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩

Φ119860119860119860

(B2)

Similarly for Figure 19(b) we obtain Φ119886119887119888

=

sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1

Φ119888119888119888= sumType 1(12)

119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888

Thus it is true for 119899 = 119896 + 1

B12 Correctness Proof for Case 32

Case 32 ForΦ119886119887119888

119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root

overlap

Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888

Computational and Mathematical Methods in Medicine 17

a

b

c

(a)

A

a

b c

(b)

A

a

s

b

c

(c)

Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another

Using the recursive formula (3) to compute Φ119886119887119888

inFigure 20 for Figure 20(a) Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)3

Φ119888119888 for Figure 20(b)Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)4

Φ119860119860

for Figure 20(c)Φ119886119887119888= (12)

2

Φ119904119904119888= (12)

3

Φ119904119888=

(12)5

Φ119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-

tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

followssumType 2(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩+1

Φ119860119860

where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

=

119871119875119860119886

+ 119871119875119860119887

+ 119871119875119860119888minus 119871119875119860119904

and 119904 is the last individual of theroot overlap path 119875

119860119904

For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ

119886119887119888= (12)

119871⟨119875119888119886119875119888119887119875119888119888⟩+1

Φ119888119888= (12)

2+1

Φ119888119888=

(12)3

Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain

Φ119886119887119888= (12)

4

Φ119860119860

and Φ119886119887119888= (12)

5

Φ119860119860

respectively

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1

For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ

119891119887119888in 119866lowast Φ

119891119887119888= sumType 2(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

In 119866 119891 is the only parent of 119886 according to the recursive

formula (3) we have Φ119886119887119888= (12)Φ

119891119887119888 Then we can plug-in

the Φ119891119887119888

and obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩+1

Φ119860119860

(B3)

For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ

119886119887119888for Figure 21(a)

In summary it is true for 119899 = 119896 + 1

A

a

s

t

f

b

c

(a)

a

t

b

A

s c

(b)

a

s

t

b

c

(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887

B13 Correctness Proof for Case 23

Case 23 For Φ119886119886119887

the path-triples in the pedigree graph 119866have mergeable path-pair

Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875

1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means

that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ

119886119886119887can be computed as follows

sumType 3(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

where 119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

Using the recursive formula (4) we obtain Φ

119886119886119887=

(12)(Φ119886119887+ Φ119891119898119887)

For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891

there4 Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

=1

2(Φ119886119887+ 0) =

1

2Φ119886119887

(as 119898 is missing) (B4)

For Φ119886119887 we use Wrightrsquos formula and obtain Φ

119886119887=

sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860

where 119875 denotes all nonoverlappingpath-pairs ⟨119875

119860119886 119875119860119887⟩

Then we have Φ119886119886119887

= (12)Φ119886119887

=

(12)sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

For Figure 22(b) we can also transform the computation

of Φ119886119886119887

to Φ119886119887

In summary it shows that the path-counting formula(A1) is true for Case 23

B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887

whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ

119886119886119887can be

transformed toΦ11988611198862119887

which is equivalent to the computationof Φ119886119887119888

for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ

119886119886119887when the path-triple belongs

to either Case 21 or Case 22

B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 6: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

6 Computational and Mathematical Methods in Medicine

Path-pair

Path-triple Path-pair levelrepresentation Decomposition A set of

building blocksSets of acceptable casesFor each building block

Acceptable cases forpath-triple Natural join

If path-pair hascrossover

No

No

Yes

Yes

Split operator

Path-triple belongs toType 2

Type 1

If path-pair hasroot overlap

Compute its contributionto Φabc

Path-triple belongs to

⟨PAa PAb⟩Generate all cases for Identify nonoverlap path-

Pairs for ⟨PAa PAb⟩Compute its contribution

to Φab

Identify acceptable cases⟨PAa PAb⟩ in thefor

context of a path-triple

Aa PAb PAc ⟩⟨P

⟨PAa PAb⟩

Figure 4 A flowchart for path-counting formula derivation

for a path-pair Then we discuss building blocks for path-triples and identify all acceptable cases which are used inderiving the path-counting formula forΦ

119886119887119888

311 Cases for a Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩with

119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 where 119860 is a common ancestor of 119886

and 119887 and 119861119894 119862(119875119860119886 119875119860119887) consists of all common individuals

shared between 119875119860119886

and 119875119860119887 except 119860 we introduce three

patterns (ie crossover 2-overlap and root 2-overlap) to gen-erate all possible cases for ⟨119875

119860119886 119875119860119887⟩

(1) 119883(119875119860119886 119875119860119887) 119875119860119886

and 119875119860119887

share one or multiple cross-over individuals

(2) 119879(119875119860119886 119875119860119887) 119875119860119886

and 119875119860119887

are root 2-overlapping from119860 and the root 2-overlap path can have one or multi-ple 2-overlap individuals

(3) 119884(119875119860119886 119875119860119887)119875119860119886

and119875119860119887

are overlapping but not from119860 and the 2-overlap path can have one or multiple 2-overlap individuals

Based on the three patterns 119883(119875119860119886 119875119860119887) 119879(119875

119860119886 119875119860119887)

and 119884(119875119860119886 119875119860119887) we use regular expressions to generate all

possible cases for the path-pair ⟨119875119860119886 119875119860119887⟩ For convenience

we drop ⟨119875119860119886 119875119860119887⟩ and use 119883119879 and 119884 instead of patterns

119883(119875119860119886 119875119860119887) 119879(119875

119860119886 119875119860119887) and 119884(119875

119860119886 119875119860119887) whenever there is

no confusion When 119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 the eight cases

shown in (7) cover all possible cases for ⟨119875119860119886 119875119860119887⟩ The com-

pleteness of eight cases shown in (7) for ⟨119875119860119886 119875119860119887⟩ can be

proved by induction on the total number of 119879 119883 and 119884appearing in ⟨119875

119860119886 119875119860119887⟩ Using the pedigree in Figure 2 Cases

1ndash3 and Case 6 are illustrated in (8) (9) (10) and (11)

Case 1 119879Case 2 119883+

Case 3 119879119883+

Case 4 119879(119883+119884)+

Case 5 119879(119883+119884)+119883+

Case 6 119883+119884Case 7 119883+(119884119883+)+

Case 8 119883+(119884119883+)+119884

(7)

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119879 (8)

Computational and Mathematical Methods in Medicine 7

S0 S1 S2 S3

PAa PAb

PAc

Figure 5 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888⟩

where 119904 119890 119905 are 2-overlap individuals and the overlap pathis a root 2-overlap path

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119879119883 (9)

where 119904 is a 2-overlap individual and the overlap path is a root2-overlap path 119905 is a crossover individual

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119889 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119883 (10)

where 119905 is a crossover individual

119860 997888rarr 119888 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119883119884 (11)

where 119890 is a crossover individual 119905 is a 2-overlap individualand the overlap path is a 2-overlap path

312 Path-Pair Level Graphical Representation of a Path-Tri-ple Given a path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ we represent each

path as a node The path-triple can be decomposed to threepath-pairs (ie ⟨119875

119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩) For

each path-pair if the two paths share at least one commonindividual (ie either 2-overlap individual or crossover indi-vidual) except119860 then there is an edge between the two nodesrepresenting the two paths Therefore we obtain four differ-ent scenarios 119878

0ndash1198783 shown in Figure 5

In Figure 5 the scenario 1198780has no edges so it means

that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent paths In

Figure 2 path-triple1 is an example of 1198780 Next we introduce

a lemma which can assist with identifying the options for theedges in the scenarios 119878

1ndash1198783

Lemma 3 Given a path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consider the

three path-pairs ⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩ if there

is a 2-overlap edge which is represented by 119884 in regular expres-sion representation of any of the three path-pairs and then thepath-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ has no contribution to Φ

119886119887119888

Proof In [17] Nadot and Vaysseix proposed from a geneticand biological point of view that Φ

119886119887119888can be evaluated by

enumerating all eligible inheritance paths at allele-level start-ing from a triple common ancestor119860 to the three individuals119886 119887 and 119888

p1

p3

A

b c

a

p2

p5

p8

p4

p7

p6

(a) Pedigree

A

b c

a

p5

p7

p4

p6

p8

p1 p2

p3

(b) Inheritance paths

Figure 6 Examples of pedigree and inheritance paths

For the pedigree in Figure 6 let us consider the path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ listed as follows 119875

119860119886 119860 rarr 119886 119875

119860119887

119860 rarr 1199013rarr 1199016rarr 1199017rarr 119887 119875

119860119888 119860 rarr 119901

4rarr 1199016rarr

1199017rarr 119888For ⟨119875

119860119887 119875119860119888⟩ 1199016is a crossover individual 119901

7is an over-

lap individual and 1199016rarr 1199017is a 2-overlap edge repre-sented

by 119884 in regular expression representation (see the definitionfor 119884 in Section 311)

For the individual 1199016 let us denote the two alleles at one

fixed autosomal locus as 1198921and 119892

2 At allele-level only one

allele can be passed down from 1199016to 1199017 Since 119901

3and 119901

4

are parents of 1199016 1198921is passed down from one parent and

1198922is passed down from the other parent It is infeasible to

pass down both 1198921and 119892

2from 119901

6to 1199017 In other words

there are no corresponding inheritance paths for the path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩with a 2-overlap edge between ⟨119875

119860119887 119875119860119888⟩

(ie Case 6119883119884) Therefore such kind of path-triples has nocontribution toΦ

119886119887119888

Figure 6(b) shows one example of eligible inheritancepaths corresponding to a pedigree graph Each individual isrepresented by two allele nodesThe eligible inheritance pathsin Figure 6(b) consist of red edges only

Only Case 1 Case 2 and Case 3 do not have 119884 in theregular expression representation of a path-pair (see (7))considering the scenarios 119878

1ndash1198783shown in Figure 5 an edge

can have three options Case 1 119879Case 2 119883Case 3 119879119883

313 Constructing Cases for a Path-Triple For the scenarios1198781ndash1198783in Figure 5 we define two building blocks 119861

1 1198612

along with some rules in Figure 7 to generate acceptablecases For 119861

1 the edge can have three options Case 1 119879

Case 2 119883 Case 3 119879119883 For 1198612 we cannot allow both edges

to be root overlap because if two edges are root overlap then

8 Computational and Mathematical Methods in Medicine

For B2 there can be at most one edge belonging to root overlap (either T or TX)

PAa PAa

PAb PAb PAc

B1 B2

For B1 the edge can have three options case 1 T case 2 X case 3 TX

Figure 7 Building blocks 1198611 1198612 and basic rules

Note Ri denotes all acceptable path-triples for ui

S3e1

T3 = R1 ⋈ R2 ⋈ R3u1 u2 u3

e2 e2 e2

e3e3 e3e1 e1

Figure 8 A graphical illustration for obtaining 1198793

119875119860119886

and 119875119860119888

must share at least one com-mon individualexcept 119860 which contradicts the fact that 119875

119860119886and 119875

119860119888have

no edgeNext we focus on generating all acceptable cases for the

scenarios 1198781ndash1198783in Figure 5 where only 119878

3contains more

than one building block In order to leverage the dependencyamong building blocks we decompose 119878

3to 1198783= 1199061= 1198612

1199062= 1198612 1199063= 1198612 shown in Figure 8 For each 119906

119894 we have a

set of acceptable path-triples denoted as 119877119894

Considering the dependency among 1198771 1198772 1198773 we use

the natural join operator denoted as ⋈ operating on 1198771

1198772 1198773 to generate all acceptable cases for 119878

3 As a result we

obtain 1198793= 1198771⋈ 1198772⋈ 1198773 where 119879

3denotes the acceptable

cases of the path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in the scenario 119878

3

For each scenario in Figure 5 we generate all acceptablecases for ⟨119875

119860119886 119875119860119887 119875119860119888⟩ The scenario 119878

0has no edges and

it shows that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent

paths while for the other scenarios 119878119896(119896 = 1 2 3) the 119896

edges can have two options

(1) all 119896 edges belong to crossover or(2) one edge belongs to root 2-overlap the remaining (119896minus

1) edges belong to crossover

In summary acceptable path-triples can have at most oneroot 2-overlap path any number of crossover individuals butzero 2-overlap path

314 Splitting Operator Considering the existence of root2-overlap path and crossover in acceptable path-triples wepropose a splitting operator to transform a path-triple withcrossover individuals to a noncrossover path-triple withoutchanging the contribution from this path-triple to Φ

119886119887119888 The

main purpose of using the splitting operator is to simplifythe path-counting formula derivation process We first usean example in Figure 9 to illustrate how the splitting operator

works In Figure 9 there is a crossover individual 119904 between119875119860119886

and 119875119860119887

in the path triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in 119866

119896+1 The

splitting operator proceeds as follows

(1) split the node 119904 to two nodes 1199041and 1199042

(2) transform the edges 119904 rarr 1198861015840 and 119904 rarr 119887

1015840 to 1199041rarr 1198861015840

and 1199042rarr 1198871015840 respectively

(3) add two new edges 1199042rarr 1198861015840 and 119904

1rarr 1198871015840

Lemma 4 Given a pedigree graph 119866119896+1

having (119896 + 1)

crossover individuals regarding ⟨119875119860119886 119875119860119887 119875119860119888⟩ shown in

Figure 9 let 119904 denote the lowest crossover individual where nodescendant of 119904 can be a crossover individual among the threepaths119875

119860119886119875119860119887 and119875

119860119888 After using the splitting operator for the

lowest crossover individual 119904 in119866119896+1 the number of crossover

individuals in 119866119896+1

is decreased by 1

Proof The splitting operator only affects the edges from 119904 to1198861015840 and 1198871015840 If there is a new crossover node appearing the only

possible node is either 1198861015840 or 1198871015840 Assume 1198871015840 becomes a cross-over individual it means that 1198871015840 is able to reach 119886 and 119887 fromtwo separate paths It contradicts the fact that 119904 is the lowestcrossover individual between 119875

119860119886and 119875

119860119887

Next we introduce a canonical graph which results fromapplying the splitting operator for all crossover individualsThe canonical graph has zero crossover individual

Definition 5 (Canonical Graph) Given a pedigree graph 119866having one or more crossover individuals regarding Φ

119886119887119888 If

there exists a graph 1198661015840 which has no crossover individualswith regards to Φ

119886119887119888such that

(i) any acceptable path-triple in 119866 has an acceptablepath-triple in 1198661015840 which has the same contribution toΦ119886119887119888

as the one in 119866 forΦ119886119887119888

(ii) any acceptable path-triple in 1198661015840 has an acceptablepath-triple in 119866 which and has the same contributionto Φ119886119887119888

as the one in 1198661015840 forΦ119886119887119888

We call 1198661015840 a canonical graph of 119866 regardingΦ119886119887119888

Lemma 6 For a pedigree graph 119866 having one or morecrossover individuals regarding ⟨119875

119860119886 119875119860119887 119875119860119888⟩ there exists a

canonical graph 1198661015840 for 119866

Computational and Mathematical Methods in Medicine 9

Ancestor-descendant relationshipParent-child relationship

a998400 b

a b a b

998400 a998400 b998400

s1 s2

A A

x w c x w c

s For Gk+1 ⟨P ⟩ = PAa PAb PAc

⟨P ⟩ = PAa PAb PAcFor Gk

Gk+1 k + 1 crossover Gk k crossover

A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b

A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b

A rarr c

A rarr c

Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866

119896having 119896 crossover

S0

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

PAa PAd

PAb PAc

Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Proof (Sketch) The proof is by induction on the number ofcrossover individuals

Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866

In the induction step let119866119896+1

be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875

119860119886and

119875119860119887

in 119866119896+1

We apply the splitting operator on 119904 in 119866119896+1

andobtain 119866

119896having 119896 crossovers by Lemma 4

315 Path-Counting Formula for Φ119886119887119888

Now we present thepath-counting formula forΦ

119886119887119888

Φ119886119887119888= sum

119860

( sum

Type 1(1

2)

119871 triple

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple+1

Φ119860119860)

(12)

where Φ119860119860= (12)(1 + 119865

119860) Φ119860119860119860

= (14)(1 + 3119865119860) 119865119860 the

inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875

119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type

2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875

119860119904ending at

the individual 119904

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 2(13)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119886

119875119860119888 and 119875

119860119904)

For completeness the path-counting formula for Φ119886119886119887

isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B

32 Path-Counting Formulas for Four Individuals

321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and

119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11

scenarios 1198780ndash11987810shown in Figure 10 where all four paths are

considered symmetricallyIn Figure 11 we introduce three building blocks 119861

1

1198612 1198613 For 119861

1and 119861

2 the rules presented in Figure 7 are also

applicable for Figure 11 For1198613 we only consider root overlap

because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the

scenario 1198783in Figure 8 Therefore we only need to consider

1198613when 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0

322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ For a scenario 119878

119894(0 le 119894 le 10) in Figure 11 we

first decompose 119878119894to one or multiple building blocks For a

scenario 119878119894isin 1198781 1198783 it has only one building block and

all acceptable cases can be obtained directly For 1198782= 1199061=

1198611 1199062= 1198611 there is no need to consider the conflict between

the edges in 1199061and 119906

2because 119906

1and 119906

2are disconnected

Let 119877119894denote all acceptable cases of the path-pairs in 119906

119894 and

let 119879119894denote all acceptable cases for 119878

119894 Therefore we obtain

1198792= 1198771times1198772where times denotes the Cartesian product operator

from relational algebra

10 Computational and Mathematical Methods in Medicine

For B3 all three edges belong to root overlap (ie having root 3-overlap)

PAa

PAb PAcPAb

PAa

C(PAa PAb PAc) ne

B1 B2 B3

Tri 0

Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)

119878119894

1198784

1198785

1198787

1198788

1198789

11987810

119878119895

1198783

1198783

1198786

1198785

1198787

1198789

For 1198786= 1199061= 1198613 we obtain 119879

6= 1198771 For 119878

119894isin 119878119894| 4 le

119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based

on which we construct 119879119894

Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le

10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878

119895 is

defined as follows

(1) 119878119895is a proper subgraph of 119878

119894

(2) if 119878119894contains 119861

3 then 119878

119895must also contain 119861

3

(3) no such 119878119896exists that 119878

119895is a proper subgraph of 119878

119896

while 119878119896is also a proper subgraph of 119878

119894

For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the

largest subgraph of 119878119894 denoted as 119878

119895 in Table 2

For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878

119894 119878119895)

denote the set of building blocks in 119878119894but not in 119878

119895 where 119878

119895is

the largest subgraph of 119878119894 Let |119864

119894| and |119864

119895| denote the number

of edges in 119878119894and 119878

119895 respectively According to Table 2 we

can conclude that |119864119894| minus |119864

119895| = 1 In order to leverage the

dependency among building blocks we consider only 1198612in

Diff(119878119894119878119895) For example Diff(119878

51198783) = 119861

2 Let119879

3denote all

acceptable cases for 1198783 And let119877

1denote the set of acceptable

cases for Diff(1198785 1198783) Then we can use 119878

3and Diff(119878

5

1198783) to construct all acceptable cases for 119878

5 Then we apply

this idea for constructing all acceptable cases for each 119878119894in

Table 2Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case

has the following properties

(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path

(2) otherwise there can be at most two root 2-overlappaths

323 Path-Counting Formula forΦ119886119887119888119889

Now we present thepath-counting formula forΦ

119886119887119888119889as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

119871quad

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad+1

Φ119860119860119860

+ sum

Type 3(1

2)

119871quad+2

Φ119860119860)

(14)

where Φ119860119860= (12)(1+119865

119860)Φ119860119860119860

= (14)(1+3119865119860)Φ119860119860119860119860

=

(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-

common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904

ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path119875119860119905

ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap

path 119875119860119905

ending at 119904 and 119905respectively

119871quad =

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119904

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860119905for Case 2 isin Type 3

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus119871119875119860119905minus 119871119875119860119904

for Case 3 isin Type 3(15)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119887

119875119860119888 119875119860119889 etc)

For completeness the path-counting formulas for Φ119886119886119887119888

and Φ119886119886119886119887

are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C

Computational and Mathematical Methods in Medicine 11

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

s t

da

A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d

(a)

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

x y

da

A rarr x rarr a

A rarr x rarr d

A rarr y rarr bA rarr y rarr c

(b)

Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889

33 Path-Counting Formulas for Two Pairs of Individuals

331 Terminology and Definitions

(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875

119878119886isin 119875(119878 119886) 119875

119878119887isin

119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875

119879119889isin 119875(119879 119889) 119878 is a common ancestor

of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889

(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875

119860119886 119875119860119887) (or

119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when

119875119860119886

and 119875119860119887

(or 119875119860119888

and 119875119860119889) pass through the same parent of

119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call

119903 a heter-overlap individual when 119875119860119894

and 119875119860119895

pass throughthe same parent of 119903

(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap

individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path

Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875

119860119886

and 119875119860119887

119905 is a homo-overlap individual between 119875119860119888

and 119875119860119889 And

119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875

119860119886and 119875

119860119889 119910 is a heter-

overlap individual between 119875119860119887

and 119875119860119888 And 119860 rarr 119909 and

119860 rarr 119910 are root heter-overlap paths

332 Path-Counting Formula for Φ119886119887119888119889

Now we presenta path-pair level graphical representation for ⟨(119875

119860119886 119875119860119887)

(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can

be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩

presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are

summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875

119860ℎ(ie the path 119875

119860ℎending at ℎ) and 119903

1

and 1199032are the last individuals of root heter-overlap paths 119875

1198601199031

and 1198751198601199032

respectivelyGiven a pedigree graph having one or multiple progeni-

tors 119901119894| 119894 gt 0 we define that the generation of a progenitor

Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

Zero root 2-overlap andzero root 3-overlap

Zero root homo-overlap and zero rootheter-overlap

One root 2-overlap path

One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap

Two root 2-overlap paths

Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps

One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

One root 2-overlap andone root 3-overlap

One root homo-overlap and two rootheter-overlaps and 119903

1= 1199032= ℎ

One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

119901119894is 0 denoted as gen(119901

119894) = 0 If an individual 119886 has only

one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1

The path-counting formula forΦ119886119887119888119889

is as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

1198712-pair

Φ119860119860119860

+ sum

Type 2(1

2)

1198712-pair+1

Φ119860119860119860

+ sum

Type 3(1

2)

1198712-pair+2

Φ119860119860

+ sum

Type 4(1

2)

1198712-pair+1

Φ119860119860)

+ sum

(119878119879)isinType 5(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

⟩+1

Φ119861119861

(16)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =

119860) there are four types (ieType 1 to Type 4)

12 Computational and Mathematical Methods in Medicine

S0S1 S2 S3 S4 S5 S6 S7

S8 S9 S10 S11 S12 S13 S14 S15 S16

PAa

PAdPAb

PAc

Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level

Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875

119860119903ending at 119903

Type 3

zero root homo-overlap and two rootheter-overlap 119875

1198601199031and1198751198601199032

ending at1199031and 1199032 respectively

one root homo-overlap 119875119860ℎ

ending at ℎand two root heter-overlap 119875

1198601199031and 119875

1198601199032

ending at 1199031and 1199032 and 119903

1= 1199032

(17)

Type 4 one root homo-overlap 119875119860ℎ

ending at ℎ andtwo root heter-overlap ending at 119903

1and 1199032 and ℎ =

1199031= 1199032 For ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is

one type (ie Type 5)Type 5 ⟨119875

119878119886 119875119878119887⟩ has zero overlap individual ⟨119875

119879119888

119875119879119889⟩ has zero overlap individual

At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875

119879119888

119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875

119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩

there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879

119861=

119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)

and 119879 has two parents119879 otherwise

1198712-pair =

119871119875119860119886+ 119871119875119860119887

+119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119903

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 1198711198751198601199031

minus 1198711198751198601199032

for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860ℎfor Type 4

119871⟨119875119878119886 119875119878119887⟩

= 119871119875119878119886+ 119871119875119878119887

for Type 5

119871⟨119875119879119888 119875119879119889⟩

= 119871119875119879119888+ 119871119875119879119889

for Type 5

(18)

Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ

119886119887119888119889

Φ119886119887119888119889

= sum

(119878119879)isinType 6(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

Φ119878119878lowast Φ119879119879 (19)

Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875

119879119888

119875119879119889⟩ is a nonoverlapping path-pair Between a path from

⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩ there are no overlap

individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩

and 119871⟨119875119879119888119875119879119889⟩

are defined as in Type 5The correctness of the path-counting formula forΦ

119886119887119888119889is

proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ

119886119886119887119888 Φ119886119887119886119888

Φ119886119887119886119887

andΦ119886119886119886119887

34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method

Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows

(1) If 119906 is 119903 then NC(119903) contains only one element theempty string

(2) Otherwise let 119906 be a node with NC(119906) and V0 V1

V119896be 119906rsquos children in sibling order then for each 119909

in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le

119894 le 119896 and lowast indicates the gender of the individualrepresented by node V

119894

Computational and Mathematical Methods in Medicine 13

Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33

In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group

We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement

In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients

In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree

Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130

4 Conclusion

We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients

0

50

100

150

200

77 181

383

769

1558

3105

6174

1235

1

2466

7

4976

1

9832

8

1951

97

250

300

Aver

age t

ime (

ms)

Individuals in pedigree

RecursiveNodecodes

Figure 14 The effect of pedigree size on computation efficiencyimprovement

0200400600800

10001200140016001800

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Aver

age t

ime (

ms)

Depth

RecursiveNodeCodes

Figure 15 The effect of depth on computation efficiency improve-ment

we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees

Appendices

A Path-Counting Formulas of Special Cases

A1 Path-Counting Formula for Φ119886119886119887

For ⟨1198751198601198861 1198751198601198862⟩ we

introduce a special case where 1198751198601198861

and 1198751198601198862

aremergeable

14 Computational and Mathematical Methods in Medicine

PAa1 PAa2 PAa1 PAa2

S0 S1

PAb PAb PAb

If is mergeable⟨P ⟩Aa1 PAa2

PAa

S2 S3

Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩

Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861

1198751198601198862⟩ is mergeable if and only if the two paths 119875

1198601198861and 119875

1198601198862

are completely identical

Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩ in Figure 16

Lemma A2 For 1198782and 119878

3in Figure 16 ⟨119875

1198601198861 1198751198601198862⟩ cannot

be a mergeable path-pair

Proof For 1198782and 119878

3 if ⟨119875

1198601198861 1198751198601198862⟩ is mergeable then

any common individual 119904 between 1198751198601198861

and 119875119860119887

is alsoa shared individual between 119875

1198601198862and 119875

119860119887 It means

119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that

119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0

Considering all three scenarios in Figure 16 only 1198781can

have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now

we present our path-counting formula forΦ119886119886119887

where 119886 is notan ancestor of 119887

Φ119886119886119887

= sum

119860

( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860

+ sum

Type 3(1

2)

119871⟨119875119860119886119875119860119887⟩+1

Φ119860119860)

(A1)

where 119860 a common ancestor of 119886 and 119887When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap

Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at the individual 119904

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair

119871 triple = 1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

for Type 3

(A2)

For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ

119886119886119887in [10] but we can use either

the recursive formula for Φ119886119887119888

or the path-counting formulaforΦ119886119887119888

to computeΦ11988611198862119887

A2 Path-Counting Formula for Φ119886119886119887119888

Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable then

we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888

119875119860119889⟩ If ⟨119875

1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875

1198601198861 1198751198601198862

119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875

119860119886 119875119860119887 119875119860119888⟩

Now we present a path-counting formula forΦ119886119886119887119888

where119886 is not an ancestor of 119887 and 119888 as follows

Φ119886119886119887119888

= sum

119860

( sum

Type 1(1

2)

119871quadminus1

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad

ΦAAA

+ sum

Type 3(1

2)

119871quad+1

Φ119860119860)

+sum

119860

( sum

Type 4(1

2)

119871 triple+1

Φ119860119860119860

+ sum

Type 5(1

2)

119871 triple+2

Φ119860119860)

(A3)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875

119860119904ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

and 1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path 119875119860119905

ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904

and 119875119860119905

ending at 119904 and 119905respectively

(A4)

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path

Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875

119860119904

ending at 119904

119871quad=

1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119904

for Type 21198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905

for Case 2isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905minus 119871119875119860119904

for Case 3isinType 3

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 5(A5)

Computational and Mathematical Methods in Medicine 15

Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ

119886119887119888119889is applicable

to computeΦ11988611198862119887119888

A3 Path-Counting Formula for Φ119886119886119886119887

A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced

when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of

a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-

densed to ⟨119875119860119886 119875119860119887⟩

Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861

1198751198601198862

and 1198751198601198863

they are mergeable if and only if theyare completely identical

Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there

must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩

⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩

Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one

mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and

⟨1198751198601198862 1198751198601198863⟩

For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable

path-pairNow we present the path-counting formula for Φ

119886119886119886119887

where 119886 is not an ancestor of 119887 as follows

Φ119886119886119886119887

= sum

119860

(3

2( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860)

+ sum

Type 3(1

2)

119871pair+2

Φ119860119860)

(A6)

where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-

sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)

Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path

Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at 119904

When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping

119871 triple = 1198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3

(A7)

Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887

=

Φ119886111988621198863119887

Then we apply the path-counting formula forΦ119886119887119888119889

to computeΦ119886111988621198863119887

Case21 Case31 ΦAAAΦabCase22 Case32

Case23 ΦAA

Figure 17 Dependency graph for different cases regardingΦ119886119887119888

andΦ119886119886119887

B Proof for Path-Counting Formulas ofThree Individuals

Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ

119886119887119888is equivalent to the

computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors

B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ

119886119887119888andΦ

119886119886119887119866 can

have 5 different cases

Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 23 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

having mergeablepath-pair⟨119875

1198601198861 1198751198601198862⟩

lArr997904 Φ119886119886119887

Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlapCase 32 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlap

lArr997904 Φ119886119887119888

(B1)

Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals

Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)

(i) forΦ119886119887 the correctness of the path-counting formula

(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32

(ii) for Case 23 it has no cycle but only depends on Φ119886119887

Thus we prove the correctness of Case 23 by trans-forming the case toΦ

119886119887

16 Computational and Mathematical Methods in Medicine

a b

c

(a)

A

a b c

(b)

Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another

Parent-child relationshipAncestor-descendant relationship

A

a

s v p

f b c

(a)

Parent-child relationshipAncestor-descendant relationship

c

a

s v

f b

(b)

Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887

(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866

B11 Correctness Proof for Case 31

Case 31 ForΦ119886119887119888

119866 does not have any path triples ⟨119875119860119886 119875119860119887

119875119860119888⟩ with root overlap

Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888

Using the recursive formula (3) to compute Φ119886119887119888

forFigure 18(a) Φ

119886119887119888= (12)Φ

119888119887119888= (12)

2

Φ119888119888119888 for Figure 18(b)

Φ119886119887119888= (12)Φ

119860119887119888= (12)

2

Φ119860119860119888

= (12)3

Φ119860119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the

contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

follows sumType 1(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

where 119871⟨119875119860119886119875119860119887 119875119860119888⟩

=

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

For Figure 18(a) 119888 is the only triple-common ancestor

and we obtain Φ119886119887119888

= (12)119871⟨119875119888119886119875119888119887

119875119888119888⟩Φ119888119888119888

= (12)2

Φ119888119888119888 for

Figure 18(b) we obtain Φ119886119887119888

= (12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

=

(12)3

Φ119860119860119860

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1

For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866

then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888

119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ

119891119887119888= sumType 1(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860

Based on the recursive formula (3)Φ

119886119887119888= (12)(Φ

119891119887119888+Φ119898119887119888)

where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ

119898119887119888= 0 Then we can plug-in the path-

counting formula forΦ119891119887119888

to obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2lowast sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩

Φ119860119860119860

= sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860119860

∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 1(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩

Φ119860119860119860

(B2)

Similarly for Figure 19(b) we obtain Φ119886119887119888

=

sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1

Φ119888119888119888= sumType 1(12)

119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888

Thus it is true for 119899 = 119896 + 1

B12 Correctness Proof for Case 32

Case 32 ForΦ119886119887119888

119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root

overlap

Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888

Computational and Mathematical Methods in Medicine 17

a

b

c

(a)

A

a

b c

(b)

A

a

s

b

c

(c)

Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another

Using the recursive formula (3) to compute Φ119886119887119888

inFigure 20 for Figure 20(a) Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)3

Φ119888119888 for Figure 20(b)Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)4

Φ119860119860

for Figure 20(c)Φ119886119887119888= (12)

2

Φ119904119904119888= (12)

3

Φ119904119888=

(12)5

Φ119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-

tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

followssumType 2(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩+1

Φ119860119860

where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

=

119871119875119860119886

+ 119871119875119860119887

+ 119871119875119860119888minus 119871119875119860119904

and 119904 is the last individual of theroot overlap path 119875

119860119904

For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ

119886119887119888= (12)

119871⟨119875119888119886119875119888119887119875119888119888⟩+1

Φ119888119888= (12)

2+1

Φ119888119888=

(12)3

Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain

Φ119886119887119888= (12)

4

Φ119860119860

and Φ119886119887119888= (12)

5

Φ119860119860

respectively

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1

For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ

119891119887119888in 119866lowast Φ

119891119887119888= sumType 2(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

In 119866 119891 is the only parent of 119886 according to the recursive

formula (3) we have Φ119886119887119888= (12)Φ

119891119887119888 Then we can plug-in

the Φ119891119887119888

and obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩+1

Φ119860119860

(B3)

For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ

119886119887119888for Figure 21(a)

In summary it is true for 119899 = 119896 + 1

A

a

s

t

f

b

c

(a)

a

t

b

A

s c

(b)

a

s

t

b

c

(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887

B13 Correctness Proof for Case 23

Case 23 For Φ119886119886119887

the path-triples in the pedigree graph 119866have mergeable path-pair

Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875

1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means

that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ

119886119886119887can be computed as follows

sumType 3(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

where 119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

Using the recursive formula (4) we obtain Φ

119886119886119887=

(12)(Φ119886119887+ Φ119891119898119887)

For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891

there4 Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

=1

2(Φ119886119887+ 0) =

1

2Φ119886119887

(as 119898 is missing) (B4)

For Φ119886119887 we use Wrightrsquos formula and obtain Φ

119886119887=

sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860

where 119875 denotes all nonoverlappingpath-pairs ⟨119875

119860119886 119875119860119887⟩

Then we have Φ119886119886119887

= (12)Φ119886119887

=

(12)sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

For Figure 22(b) we can also transform the computation

of Φ119886119886119887

to Φ119886119887

In summary it shows that the path-counting formula(A1) is true for Case 23

B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887

whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ

119886119886119887can be

transformed toΦ11988611198862119887

which is equivalent to the computationof Φ119886119887119888

for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ

119886119886119887when the path-triple belongs

to either Case 21 or Case 22

B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 7: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

Computational and Mathematical Methods in Medicine 7

S0 S1 S2 S3

PAa PAb

PAc

Figure 5 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888⟩

where 119904 119890 119905 are 2-overlap individuals and the overlap pathis a root 2-overlap path

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119879119883 (9)

where 119904 is a 2-overlap individual and the overlap path is a root2-overlap path 119905 is a crossover individual

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119889 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119883 (10)

where 119905 is a crossover individual

119860 997888rarr 119888 997888rarr 119890 997888rarr 119905 997888rarr 119886

119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119883119884 (11)

where 119890 is a crossover individual 119905 is a 2-overlap individualand the overlap path is a 2-overlap path

312 Path-Pair Level Graphical Representation of a Path-Tri-ple Given a path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ we represent each

path as a node The path-triple can be decomposed to threepath-pairs (ie ⟨119875

119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩) For

each path-pair if the two paths share at least one commonindividual (ie either 2-overlap individual or crossover indi-vidual) except119860 then there is an edge between the two nodesrepresenting the two paths Therefore we obtain four differ-ent scenarios 119878

0ndash1198783 shown in Figure 5

In Figure 5 the scenario 1198780has no edges so it means

that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent paths In

Figure 2 path-triple1 is an example of 1198780 Next we introduce

a lemma which can assist with identifying the options for theedges in the scenarios 119878

1ndash1198783

Lemma 3 Given a path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consider the

three path-pairs ⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875

119860119887 119875119860119888⟩ if there

is a 2-overlap edge which is represented by 119884 in regular expres-sion representation of any of the three path-pairs and then thepath-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ has no contribution to Φ

119886119887119888

Proof In [17] Nadot and Vaysseix proposed from a geneticand biological point of view that Φ

119886119887119888can be evaluated by

enumerating all eligible inheritance paths at allele-level start-ing from a triple common ancestor119860 to the three individuals119886 119887 and 119888

p1

p3

A

b c

a

p2

p5

p8

p4

p7

p6

(a) Pedigree

A

b c

a

p5

p7

p4

p6

p8

p1 p2

p3

(b) Inheritance paths

Figure 6 Examples of pedigree and inheritance paths

For the pedigree in Figure 6 let us consider the path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩ listed as follows 119875

119860119886 119860 rarr 119886 119875

119860119887

119860 rarr 1199013rarr 1199016rarr 1199017rarr 119887 119875

119860119888 119860 rarr 119901

4rarr 1199016rarr

1199017rarr 119888For ⟨119875

119860119887 119875119860119888⟩ 1199016is a crossover individual 119901

7is an over-

lap individual and 1199016rarr 1199017is a 2-overlap edge repre-sented

by 119884 in regular expression representation (see the definitionfor 119884 in Section 311)

For the individual 1199016 let us denote the two alleles at one

fixed autosomal locus as 1198921and 119892

2 At allele-level only one

allele can be passed down from 1199016to 1199017 Since 119901

3and 119901

4

are parents of 1199016 1198921is passed down from one parent and

1198922is passed down from the other parent It is infeasible to

pass down both 1198921and 119892

2from 119901

6to 1199017 In other words

there are no corresponding inheritance paths for the path-triple ⟨119875

119860119886 119875119860119887 119875119860119888⟩with a 2-overlap edge between ⟨119875

119860119887 119875119860119888⟩

(ie Case 6119883119884) Therefore such kind of path-triples has nocontribution toΦ

119886119887119888

Figure 6(b) shows one example of eligible inheritancepaths corresponding to a pedigree graph Each individual isrepresented by two allele nodesThe eligible inheritance pathsin Figure 6(b) consist of red edges only

Only Case 1 Case 2 and Case 3 do not have 119884 in theregular expression representation of a path-pair (see (7))considering the scenarios 119878

1ndash1198783shown in Figure 5 an edge

can have three options Case 1 119879Case 2 119883Case 3 119879119883

313 Constructing Cases for a Path-Triple For the scenarios1198781ndash1198783in Figure 5 we define two building blocks 119861

1 1198612

along with some rules in Figure 7 to generate acceptablecases For 119861

1 the edge can have three options Case 1 119879

Case 2 119883 Case 3 119879119883 For 1198612 we cannot allow both edges

to be root overlap because if two edges are root overlap then

8 Computational and Mathematical Methods in Medicine

For B2 there can be at most one edge belonging to root overlap (either T or TX)

PAa PAa

PAb PAb PAc

B1 B2

For B1 the edge can have three options case 1 T case 2 X case 3 TX

Figure 7 Building blocks 1198611 1198612 and basic rules

Note Ri denotes all acceptable path-triples for ui

S3e1

T3 = R1 ⋈ R2 ⋈ R3u1 u2 u3

e2 e2 e2

e3e3 e3e1 e1

Figure 8 A graphical illustration for obtaining 1198793

119875119860119886

and 119875119860119888

must share at least one com-mon individualexcept 119860 which contradicts the fact that 119875

119860119886and 119875

119860119888have

no edgeNext we focus on generating all acceptable cases for the

scenarios 1198781ndash1198783in Figure 5 where only 119878

3contains more

than one building block In order to leverage the dependencyamong building blocks we decompose 119878

3to 1198783= 1199061= 1198612

1199062= 1198612 1199063= 1198612 shown in Figure 8 For each 119906

119894 we have a

set of acceptable path-triples denoted as 119877119894

Considering the dependency among 1198771 1198772 1198773 we use

the natural join operator denoted as ⋈ operating on 1198771

1198772 1198773 to generate all acceptable cases for 119878

3 As a result we

obtain 1198793= 1198771⋈ 1198772⋈ 1198773 where 119879

3denotes the acceptable

cases of the path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in the scenario 119878

3

For each scenario in Figure 5 we generate all acceptablecases for ⟨119875

119860119886 119875119860119887 119875119860119888⟩ The scenario 119878

0has no edges and

it shows that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent

paths while for the other scenarios 119878119896(119896 = 1 2 3) the 119896

edges can have two options

(1) all 119896 edges belong to crossover or(2) one edge belongs to root 2-overlap the remaining (119896minus

1) edges belong to crossover

In summary acceptable path-triples can have at most oneroot 2-overlap path any number of crossover individuals butzero 2-overlap path

314 Splitting Operator Considering the existence of root2-overlap path and crossover in acceptable path-triples wepropose a splitting operator to transform a path-triple withcrossover individuals to a noncrossover path-triple withoutchanging the contribution from this path-triple to Φ

119886119887119888 The

main purpose of using the splitting operator is to simplifythe path-counting formula derivation process We first usean example in Figure 9 to illustrate how the splitting operator

works In Figure 9 there is a crossover individual 119904 between119875119860119886

and 119875119860119887

in the path triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in 119866

119896+1 The

splitting operator proceeds as follows

(1) split the node 119904 to two nodes 1199041and 1199042

(2) transform the edges 119904 rarr 1198861015840 and 119904 rarr 119887

1015840 to 1199041rarr 1198861015840

and 1199042rarr 1198871015840 respectively

(3) add two new edges 1199042rarr 1198861015840 and 119904

1rarr 1198871015840

Lemma 4 Given a pedigree graph 119866119896+1

having (119896 + 1)

crossover individuals regarding ⟨119875119860119886 119875119860119887 119875119860119888⟩ shown in

Figure 9 let 119904 denote the lowest crossover individual where nodescendant of 119904 can be a crossover individual among the threepaths119875

119860119886119875119860119887 and119875

119860119888 After using the splitting operator for the

lowest crossover individual 119904 in119866119896+1 the number of crossover

individuals in 119866119896+1

is decreased by 1

Proof The splitting operator only affects the edges from 119904 to1198861015840 and 1198871015840 If there is a new crossover node appearing the only

possible node is either 1198861015840 or 1198871015840 Assume 1198871015840 becomes a cross-over individual it means that 1198871015840 is able to reach 119886 and 119887 fromtwo separate paths It contradicts the fact that 119904 is the lowestcrossover individual between 119875

119860119886and 119875

119860119887

Next we introduce a canonical graph which results fromapplying the splitting operator for all crossover individualsThe canonical graph has zero crossover individual

Definition 5 (Canonical Graph) Given a pedigree graph 119866having one or more crossover individuals regarding Φ

119886119887119888 If

there exists a graph 1198661015840 which has no crossover individualswith regards to Φ

119886119887119888such that

(i) any acceptable path-triple in 119866 has an acceptablepath-triple in 1198661015840 which has the same contribution toΦ119886119887119888

as the one in 119866 forΦ119886119887119888

(ii) any acceptable path-triple in 1198661015840 has an acceptablepath-triple in 119866 which and has the same contributionto Φ119886119887119888

as the one in 1198661015840 forΦ119886119887119888

We call 1198661015840 a canonical graph of 119866 regardingΦ119886119887119888

Lemma 6 For a pedigree graph 119866 having one or morecrossover individuals regarding ⟨119875

119860119886 119875119860119887 119875119860119888⟩ there exists a

canonical graph 1198661015840 for 119866

Computational and Mathematical Methods in Medicine 9

Ancestor-descendant relationshipParent-child relationship

a998400 b

a b a b

998400 a998400 b998400

s1 s2

A A

x w c x w c

s For Gk+1 ⟨P ⟩ = PAa PAb PAc

⟨P ⟩ = PAa PAb PAcFor Gk

Gk+1 k + 1 crossover Gk k crossover

A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b

A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b

A rarr c

A rarr c

Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866

119896having 119896 crossover

S0

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

PAa PAd

PAb PAc

Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Proof (Sketch) The proof is by induction on the number ofcrossover individuals

Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866

In the induction step let119866119896+1

be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875

119860119886and

119875119860119887

in 119866119896+1

We apply the splitting operator on 119904 in 119866119896+1

andobtain 119866

119896having 119896 crossovers by Lemma 4

315 Path-Counting Formula for Φ119886119887119888

Now we present thepath-counting formula forΦ

119886119887119888

Φ119886119887119888= sum

119860

( sum

Type 1(1

2)

119871 triple

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple+1

Φ119860119860)

(12)

where Φ119860119860= (12)(1 + 119865

119860) Φ119860119860119860

= (14)(1 + 3119865119860) 119865119860 the

inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875

119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type

2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875

119860119904ending at

the individual 119904

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 2(13)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119886

119875119860119888 and 119875

119860119904)

For completeness the path-counting formula for Φ119886119886119887

isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B

32 Path-Counting Formulas for Four Individuals

321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and

119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11

scenarios 1198780ndash11987810shown in Figure 10 where all four paths are

considered symmetricallyIn Figure 11 we introduce three building blocks 119861

1

1198612 1198613 For 119861

1and 119861

2 the rules presented in Figure 7 are also

applicable for Figure 11 For1198613 we only consider root overlap

because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the

scenario 1198783in Figure 8 Therefore we only need to consider

1198613when 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0

322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ For a scenario 119878

119894(0 le 119894 le 10) in Figure 11 we

first decompose 119878119894to one or multiple building blocks For a

scenario 119878119894isin 1198781 1198783 it has only one building block and

all acceptable cases can be obtained directly For 1198782= 1199061=

1198611 1199062= 1198611 there is no need to consider the conflict between

the edges in 1199061and 119906

2because 119906

1and 119906

2are disconnected

Let 119877119894denote all acceptable cases of the path-pairs in 119906

119894 and

let 119879119894denote all acceptable cases for 119878

119894 Therefore we obtain

1198792= 1198771times1198772where times denotes the Cartesian product operator

from relational algebra

10 Computational and Mathematical Methods in Medicine

For B3 all three edges belong to root overlap (ie having root 3-overlap)

PAa

PAb PAcPAb

PAa

C(PAa PAb PAc) ne

B1 B2 B3

Tri 0

Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)

119878119894

1198784

1198785

1198787

1198788

1198789

11987810

119878119895

1198783

1198783

1198786

1198785

1198787

1198789

For 1198786= 1199061= 1198613 we obtain 119879

6= 1198771 For 119878

119894isin 119878119894| 4 le

119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based

on which we construct 119879119894

Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le

10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878

119895 is

defined as follows

(1) 119878119895is a proper subgraph of 119878

119894

(2) if 119878119894contains 119861

3 then 119878

119895must also contain 119861

3

(3) no such 119878119896exists that 119878

119895is a proper subgraph of 119878

119896

while 119878119896is also a proper subgraph of 119878

119894

For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the

largest subgraph of 119878119894 denoted as 119878

119895 in Table 2

For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878

119894 119878119895)

denote the set of building blocks in 119878119894but not in 119878

119895 where 119878

119895is

the largest subgraph of 119878119894 Let |119864

119894| and |119864

119895| denote the number

of edges in 119878119894and 119878

119895 respectively According to Table 2 we

can conclude that |119864119894| minus |119864

119895| = 1 In order to leverage the

dependency among building blocks we consider only 1198612in

Diff(119878119894119878119895) For example Diff(119878

51198783) = 119861

2 Let119879

3denote all

acceptable cases for 1198783 And let119877

1denote the set of acceptable

cases for Diff(1198785 1198783) Then we can use 119878

3and Diff(119878

5

1198783) to construct all acceptable cases for 119878

5 Then we apply

this idea for constructing all acceptable cases for each 119878119894in

Table 2Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case

has the following properties

(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path

(2) otherwise there can be at most two root 2-overlappaths

323 Path-Counting Formula forΦ119886119887119888119889

Now we present thepath-counting formula forΦ

119886119887119888119889as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

119871quad

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad+1

Φ119860119860119860

+ sum

Type 3(1

2)

119871quad+2

Φ119860119860)

(14)

where Φ119860119860= (12)(1+119865

119860)Φ119860119860119860

= (14)(1+3119865119860)Φ119860119860119860119860

=

(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-

common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904

ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path119875119860119905

ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap

path 119875119860119905

ending at 119904 and 119905respectively

119871quad =

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119904

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860119905for Case 2 isin Type 3

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus119871119875119860119905minus 119871119875119860119904

for Case 3 isin Type 3(15)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119887

119875119860119888 119875119860119889 etc)

For completeness the path-counting formulas for Φ119886119886119887119888

and Φ119886119886119886119887

are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C

Computational and Mathematical Methods in Medicine 11

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

s t

da

A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d

(a)

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

x y

da

A rarr x rarr a

A rarr x rarr d

A rarr y rarr bA rarr y rarr c

(b)

Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889

33 Path-Counting Formulas for Two Pairs of Individuals

331 Terminology and Definitions

(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875

119878119886isin 119875(119878 119886) 119875

119878119887isin

119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875

119879119889isin 119875(119879 119889) 119878 is a common ancestor

of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889

(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875

119860119886 119875119860119887) (or

119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when

119875119860119886

and 119875119860119887

(or 119875119860119888

and 119875119860119889) pass through the same parent of

119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call

119903 a heter-overlap individual when 119875119860119894

and 119875119860119895

pass throughthe same parent of 119903

(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap

individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path

Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875

119860119886

and 119875119860119887

119905 is a homo-overlap individual between 119875119860119888

and 119875119860119889 And

119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875

119860119886and 119875

119860119889 119910 is a heter-

overlap individual between 119875119860119887

and 119875119860119888 And 119860 rarr 119909 and

119860 rarr 119910 are root heter-overlap paths

332 Path-Counting Formula for Φ119886119887119888119889

Now we presenta path-pair level graphical representation for ⟨(119875

119860119886 119875119860119887)

(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can

be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩

presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are

summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875

119860ℎ(ie the path 119875

119860ℎending at ℎ) and 119903

1

and 1199032are the last individuals of root heter-overlap paths 119875

1198601199031

and 1198751198601199032

respectivelyGiven a pedigree graph having one or multiple progeni-

tors 119901119894| 119894 gt 0 we define that the generation of a progenitor

Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

Zero root 2-overlap andzero root 3-overlap

Zero root homo-overlap and zero rootheter-overlap

One root 2-overlap path

One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap

Two root 2-overlap paths

Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps

One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

One root 2-overlap andone root 3-overlap

One root homo-overlap and two rootheter-overlaps and 119903

1= 1199032= ℎ

One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

119901119894is 0 denoted as gen(119901

119894) = 0 If an individual 119886 has only

one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1

The path-counting formula forΦ119886119887119888119889

is as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

1198712-pair

Φ119860119860119860

+ sum

Type 2(1

2)

1198712-pair+1

Φ119860119860119860

+ sum

Type 3(1

2)

1198712-pair+2

Φ119860119860

+ sum

Type 4(1

2)

1198712-pair+1

Φ119860119860)

+ sum

(119878119879)isinType 5(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

⟩+1

Φ119861119861

(16)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =

119860) there are four types (ieType 1 to Type 4)

12 Computational and Mathematical Methods in Medicine

S0S1 S2 S3 S4 S5 S6 S7

S8 S9 S10 S11 S12 S13 S14 S15 S16

PAa

PAdPAb

PAc

Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level

Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875

119860119903ending at 119903

Type 3

zero root homo-overlap and two rootheter-overlap 119875

1198601199031and1198751198601199032

ending at1199031and 1199032 respectively

one root homo-overlap 119875119860ℎ

ending at ℎand two root heter-overlap 119875

1198601199031and 119875

1198601199032

ending at 1199031and 1199032 and 119903

1= 1199032

(17)

Type 4 one root homo-overlap 119875119860ℎ

ending at ℎ andtwo root heter-overlap ending at 119903

1and 1199032 and ℎ =

1199031= 1199032 For ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is

one type (ie Type 5)Type 5 ⟨119875

119878119886 119875119878119887⟩ has zero overlap individual ⟨119875

119879119888

119875119879119889⟩ has zero overlap individual

At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875

119879119888

119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875

119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩

there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879

119861=

119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)

and 119879 has two parents119879 otherwise

1198712-pair =

119871119875119860119886+ 119871119875119860119887

+119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119903

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 1198711198751198601199031

minus 1198711198751198601199032

for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860ℎfor Type 4

119871⟨119875119878119886 119875119878119887⟩

= 119871119875119878119886+ 119871119875119878119887

for Type 5

119871⟨119875119879119888 119875119879119889⟩

= 119871119875119879119888+ 119871119875119879119889

for Type 5

(18)

Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ

119886119887119888119889

Φ119886119887119888119889

= sum

(119878119879)isinType 6(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

Φ119878119878lowast Φ119879119879 (19)

Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875

119879119888

119875119879119889⟩ is a nonoverlapping path-pair Between a path from

⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩ there are no overlap

individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩

and 119871⟨119875119879119888119875119879119889⟩

are defined as in Type 5The correctness of the path-counting formula forΦ

119886119887119888119889is

proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ

119886119886119887119888 Φ119886119887119886119888

Φ119886119887119886119887

andΦ119886119886119886119887

34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method

Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows

(1) If 119906 is 119903 then NC(119903) contains only one element theempty string

(2) Otherwise let 119906 be a node with NC(119906) and V0 V1

V119896be 119906rsquos children in sibling order then for each 119909

in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le

119894 le 119896 and lowast indicates the gender of the individualrepresented by node V

119894

Computational and Mathematical Methods in Medicine 13

Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33

In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group

We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement

In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients

In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree

Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130

4 Conclusion

We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients

0

50

100

150

200

77 181

383

769

1558

3105

6174

1235

1

2466

7

4976

1

9832

8

1951

97

250

300

Aver

age t

ime (

ms)

Individuals in pedigree

RecursiveNodecodes

Figure 14 The effect of pedigree size on computation efficiencyimprovement

0200400600800

10001200140016001800

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Aver

age t

ime (

ms)

Depth

RecursiveNodeCodes

Figure 15 The effect of depth on computation efficiency improve-ment

we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees

Appendices

A Path-Counting Formulas of Special Cases

A1 Path-Counting Formula for Φ119886119886119887

For ⟨1198751198601198861 1198751198601198862⟩ we

introduce a special case where 1198751198601198861

and 1198751198601198862

aremergeable

14 Computational and Mathematical Methods in Medicine

PAa1 PAa2 PAa1 PAa2

S0 S1

PAb PAb PAb

If is mergeable⟨P ⟩Aa1 PAa2

PAa

S2 S3

Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩

Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861

1198751198601198862⟩ is mergeable if and only if the two paths 119875

1198601198861and 119875

1198601198862

are completely identical

Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩ in Figure 16

Lemma A2 For 1198782and 119878

3in Figure 16 ⟨119875

1198601198861 1198751198601198862⟩ cannot

be a mergeable path-pair

Proof For 1198782and 119878

3 if ⟨119875

1198601198861 1198751198601198862⟩ is mergeable then

any common individual 119904 between 1198751198601198861

and 119875119860119887

is alsoa shared individual between 119875

1198601198862and 119875

119860119887 It means

119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that

119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0

Considering all three scenarios in Figure 16 only 1198781can

have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now

we present our path-counting formula forΦ119886119886119887

where 119886 is notan ancestor of 119887

Φ119886119886119887

= sum

119860

( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860

+ sum

Type 3(1

2)

119871⟨119875119860119886119875119860119887⟩+1

Φ119860119860)

(A1)

where 119860 a common ancestor of 119886 and 119887When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap

Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at the individual 119904

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair

119871 triple = 1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

for Type 3

(A2)

For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ

119886119886119887in [10] but we can use either

the recursive formula for Φ119886119887119888

or the path-counting formulaforΦ119886119887119888

to computeΦ11988611198862119887

A2 Path-Counting Formula for Φ119886119886119887119888

Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable then

we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888

119875119860119889⟩ If ⟨119875

1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875

1198601198861 1198751198601198862

119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875

119860119886 119875119860119887 119875119860119888⟩

Now we present a path-counting formula forΦ119886119886119887119888

where119886 is not an ancestor of 119887 and 119888 as follows

Φ119886119886119887119888

= sum

119860

( sum

Type 1(1

2)

119871quadminus1

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad

ΦAAA

+ sum

Type 3(1

2)

119871quad+1

Φ119860119860)

+sum

119860

( sum

Type 4(1

2)

119871 triple+1

Φ119860119860119860

+ sum

Type 5(1

2)

119871 triple+2

Φ119860119860)

(A3)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875

119860119904ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

and 1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path 119875119860119905

ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904

and 119875119860119905

ending at 119904 and 119905respectively

(A4)

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path

Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875

119860119904

ending at 119904

119871quad=

1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119904

for Type 21198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905

for Case 2isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905minus 119871119875119860119904

for Case 3isinType 3

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 5(A5)

Computational and Mathematical Methods in Medicine 15

Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ

119886119887119888119889is applicable

to computeΦ11988611198862119887119888

A3 Path-Counting Formula for Φ119886119886119886119887

A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced

when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of

a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-

densed to ⟨119875119860119886 119875119860119887⟩

Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861

1198751198601198862

and 1198751198601198863

they are mergeable if and only if theyare completely identical

Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there

must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩

⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩

Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one

mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and

⟨1198751198601198862 1198751198601198863⟩

For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable

path-pairNow we present the path-counting formula for Φ

119886119886119886119887

where 119886 is not an ancestor of 119887 as follows

Φ119886119886119886119887

= sum

119860

(3

2( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860)

+ sum

Type 3(1

2)

119871pair+2

Φ119860119860)

(A6)

where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-

sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)

Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path

Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at 119904

When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping

119871 triple = 1198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3

(A7)

Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887

=

Φ119886111988621198863119887

Then we apply the path-counting formula forΦ119886119887119888119889

to computeΦ119886111988621198863119887

Case21 Case31 ΦAAAΦabCase22 Case32

Case23 ΦAA

Figure 17 Dependency graph for different cases regardingΦ119886119887119888

andΦ119886119886119887

B Proof for Path-Counting Formulas ofThree Individuals

Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ

119886119887119888is equivalent to the

computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors

B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ

119886119887119888andΦ

119886119886119887119866 can

have 5 different cases

Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 23 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

having mergeablepath-pair⟨119875

1198601198861 1198751198601198862⟩

lArr997904 Φ119886119886119887

Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlapCase 32 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlap

lArr997904 Φ119886119887119888

(B1)

Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals

Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)

(i) forΦ119886119887 the correctness of the path-counting formula

(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32

(ii) for Case 23 it has no cycle but only depends on Φ119886119887

Thus we prove the correctness of Case 23 by trans-forming the case toΦ

119886119887

16 Computational and Mathematical Methods in Medicine

a b

c

(a)

A

a b c

(b)

Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another

Parent-child relationshipAncestor-descendant relationship

A

a

s v p

f b c

(a)

Parent-child relationshipAncestor-descendant relationship

c

a

s v

f b

(b)

Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887

(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866

B11 Correctness Proof for Case 31

Case 31 ForΦ119886119887119888

119866 does not have any path triples ⟨119875119860119886 119875119860119887

119875119860119888⟩ with root overlap

Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888

Using the recursive formula (3) to compute Φ119886119887119888

forFigure 18(a) Φ

119886119887119888= (12)Φ

119888119887119888= (12)

2

Φ119888119888119888 for Figure 18(b)

Φ119886119887119888= (12)Φ

119860119887119888= (12)

2

Φ119860119860119888

= (12)3

Φ119860119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the

contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

follows sumType 1(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

where 119871⟨119875119860119886119875119860119887 119875119860119888⟩

=

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

For Figure 18(a) 119888 is the only triple-common ancestor

and we obtain Φ119886119887119888

= (12)119871⟨119875119888119886119875119888119887

119875119888119888⟩Φ119888119888119888

= (12)2

Φ119888119888119888 for

Figure 18(b) we obtain Φ119886119887119888

= (12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

=

(12)3

Φ119860119860119860

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1

For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866

then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888

119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ

119891119887119888= sumType 1(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860

Based on the recursive formula (3)Φ

119886119887119888= (12)(Φ

119891119887119888+Φ119898119887119888)

where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ

119898119887119888= 0 Then we can plug-in the path-

counting formula forΦ119891119887119888

to obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2lowast sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩

Φ119860119860119860

= sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860119860

∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 1(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩

Φ119860119860119860

(B2)

Similarly for Figure 19(b) we obtain Φ119886119887119888

=

sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1

Φ119888119888119888= sumType 1(12)

119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888

Thus it is true for 119899 = 119896 + 1

B12 Correctness Proof for Case 32

Case 32 ForΦ119886119887119888

119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root

overlap

Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888

Computational and Mathematical Methods in Medicine 17

a

b

c

(a)

A

a

b c

(b)

A

a

s

b

c

(c)

Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another

Using the recursive formula (3) to compute Φ119886119887119888

inFigure 20 for Figure 20(a) Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)3

Φ119888119888 for Figure 20(b)Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)4

Φ119860119860

for Figure 20(c)Φ119886119887119888= (12)

2

Φ119904119904119888= (12)

3

Φ119904119888=

(12)5

Φ119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-

tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

followssumType 2(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩+1

Φ119860119860

where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

=

119871119875119860119886

+ 119871119875119860119887

+ 119871119875119860119888minus 119871119875119860119904

and 119904 is the last individual of theroot overlap path 119875

119860119904

For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ

119886119887119888= (12)

119871⟨119875119888119886119875119888119887119875119888119888⟩+1

Φ119888119888= (12)

2+1

Φ119888119888=

(12)3

Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain

Φ119886119887119888= (12)

4

Φ119860119860

and Φ119886119887119888= (12)

5

Φ119860119860

respectively

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1

For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ

119891119887119888in 119866lowast Φ

119891119887119888= sumType 2(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

In 119866 119891 is the only parent of 119886 according to the recursive

formula (3) we have Φ119886119887119888= (12)Φ

119891119887119888 Then we can plug-in

the Φ119891119887119888

and obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩+1

Φ119860119860

(B3)

For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ

119886119887119888for Figure 21(a)

In summary it is true for 119899 = 119896 + 1

A

a

s

t

f

b

c

(a)

a

t

b

A

s c

(b)

a

s

t

b

c

(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887

B13 Correctness Proof for Case 23

Case 23 For Φ119886119886119887

the path-triples in the pedigree graph 119866have mergeable path-pair

Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875

1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means

that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ

119886119886119887can be computed as follows

sumType 3(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

where 119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

Using the recursive formula (4) we obtain Φ

119886119886119887=

(12)(Φ119886119887+ Φ119891119898119887)

For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891

there4 Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

=1

2(Φ119886119887+ 0) =

1

2Φ119886119887

(as 119898 is missing) (B4)

For Φ119886119887 we use Wrightrsquos formula and obtain Φ

119886119887=

sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860

where 119875 denotes all nonoverlappingpath-pairs ⟨119875

119860119886 119875119860119887⟩

Then we have Φ119886119886119887

= (12)Φ119886119887

=

(12)sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

For Figure 22(b) we can also transform the computation

of Φ119886119886119887

to Φ119886119887

In summary it shows that the path-counting formula(A1) is true for Case 23

B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887

whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ

119886119886119887can be

transformed toΦ11988611198862119887

which is equivalent to the computationof Φ119886119887119888

for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ

119886119886119887when the path-triple belongs

to either Case 21 or Case 22

B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 8: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

8 Computational and Mathematical Methods in Medicine

For B2 there can be at most one edge belonging to root overlap (either T or TX)

PAa PAa

PAb PAb PAc

B1 B2

For B1 the edge can have three options case 1 T case 2 X case 3 TX

Figure 7 Building blocks 1198611 1198612 and basic rules

Note Ri denotes all acceptable path-triples for ui

S3e1

T3 = R1 ⋈ R2 ⋈ R3u1 u2 u3

e2 e2 e2

e3e3 e3e1 e1

Figure 8 A graphical illustration for obtaining 1198793

119875119860119886

and 119875119860119888

must share at least one com-mon individualexcept 119860 which contradicts the fact that 119875

119860119886and 119875

119860119888have

no edgeNext we focus on generating all acceptable cases for the

scenarios 1198781ndash1198783in Figure 5 where only 119878

3contains more

than one building block In order to leverage the dependencyamong building blocks we decompose 119878

3to 1198783= 1199061= 1198612

1199062= 1198612 1199063= 1198612 shown in Figure 8 For each 119906

119894 we have a

set of acceptable path-triples denoted as 119877119894

Considering the dependency among 1198771 1198772 1198773 we use

the natural join operator denoted as ⋈ operating on 1198771

1198772 1198773 to generate all acceptable cases for 119878

3 As a result we

obtain 1198793= 1198771⋈ 1198772⋈ 1198773 where 119879

3denotes the acceptable

cases of the path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in the scenario 119878

3

For each scenario in Figure 5 we generate all acceptablecases for ⟨119875

119860119886 119875119860119887 119875119860119888⟩ The scenario 119878

0has no edges and

it shows that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent

paths while for the other scenarios 119878119896(119896 = 1 2 3) the 119896

edges can have two options

(1) all 119896 edges belong to crossover or(2) one edge belongs to root 2-overlap the remaining (119896minus

1) edges belong to crossover

In summary acceptable path-triples can have at most oneroot 2-overlap path any number of crossover individuals butzero 2-overlap path

314 Splitting Operator Considering the existence of root2-overlap path and crossover in acceptable path-triples wepropose a splitting operator to transform a path-triple withcrossover individuals to a noncrossover path-triple withoutchanging the contribution from this path-triple to Φ

119886119887119888 The

main purpose of using the splitting operator is to simplifythe path-counting formula derivation process We first usean example in Figure 9 to illustrate how the splitting operator

works In Figure 9 there is a crossover individual 119904 between119875119860119886

and 119875119860119887

in the path triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in 119866

119896+1 The

splitting operator proceeds as follows

(1) split the node 119904 to two nodes 1199041and 1199042

(2) transform the edges 119904 rarr 1198861015840 and 119904 rarr 119887

1015840 to 1199041rarr 1198861015840

and 1199042rarr 1198871015840 respectively

(3) add two new edges 1199042rarr 1198861015840 and 119904

1rarr 1198871015840

Lemma 4 Given a pedigree graph 119866119896+1

having (119896 + 1)

crossover individuals regarding ⟨119875119860119886 119875119860119887 119875119860119888⟩ shown in

Figure 9 let 119904 denote the lowest crossover individual where nodescendant of 119904 can be a crossover individual among the threepaths119875

119860119886119875119860119887 and119875

119860119888 After using the splitting operator for the

lowest crossover individual 119904 in119866119896+1 the number of crossover

individuals in 119866119896+1

is decreased by 1

Proof The splitting operator only affects the edges from 119904 to1198861015840 and 1198871015840 If there is a new crossover node appearing the only

possible node is either 1198861015840 or 1198871015840 Assume 1198871015840 becomes a cross-over individual it means that 1198871015840 is able to reach 119886 and 119887 fromtwo separate paths It contradicts the fact that 119904 is the lowestcrossover individual between 119875

119860119886and 119875

119860119887

Next we introduce a canonical graph which results fromapplying the splitting operator for all crossover individualsThe canonical graph has zero crossover individual

Definition 5 (Canonical Graph) Given a pedigree graph 119866having one or more crossover individuals regarding Φ

119886119887119888 If

there exists a graph 1198661015840 which has no crossover individualswith regards to Φ

119886119887119888such that

(i) any acceptable path-triple in 119866 has an acceptablepath-triple in 1198661015840 which has the same contribution toΦ119886119887119888

as the one in 119866 forΦ119886119887119888

(ii) any acceptable path-triple in 1198661015840 has an acceptablepath-triple in 119866 which and has the same contributionto Φ119886119887119888

as the one in 1198661015840 forΦ119886119887119888

We call 1198661015840 a canonical graph of 119866 regardingΦ119886119887119888

Lemma 6 For a pedigree graph 119866 having one or morecrossover individuals regarding ⟨119875

119860119886 119875119860119887 119875119860119888⟩ there exists a

canonical graph 1198661015840 for 119866

Computational and Mathematical Methods in Medicine 9

Ancestor-descendant relationshipParent-child relationship

a998400 b

a b a b

998400 a998400 b998400

s1 s2

A A

x w c x w c

s For Gk+1 ⟨P ⟩ = PAa PAb PAc

⟨P ⟩ = PAa PAb PAcFor Gk

Gk+1 k + 1 crossover Gk k crossover

A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b

A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b

A rarr c

A rarr c

Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866

119896having 119896 crossover

S0

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

PAa PAd

PAb PAc

Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Proof (Sketch) The proof is by induction on the number ofcrossover individuals

Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866

In the induction step let119866119896+1

be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875

119860119886and

119875119860119887

in 119866119896+1

We apply the splitting operator on 119904 in 119866119896+1

andobtain 119866

119896having 119896 crossovers by Lemma 4

315 Path-Counting Formula for Φ119886119887119888

Now we present thepath-counting formula forΦ

119886119887119888

Φ119886119887119888= sum

119860

( sum

Type 1(1

2)

119871 triple

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple+1

Φ119860119860)

(12)

where Φ119860119860= (12)(1 + 119865

119860) Φ119860119860119860

= (14)(1 + 3119865119860) 119865119860 the

inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875

119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type

2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875

119860119904ending at

the individual 119904

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 2(13)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119886

119875119860119888 and 119875

119860119904)

For completeness the path-counting formula for Φ119886119886119887

isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B

32 Path-Counting Formulas for Four Individuals

321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and

119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11

scenarios 1198780ndash11987810shown in Figure 10 where all four paths are

considered symmetricallyIn Figure 11 we introduce three building blocks 119861

1

1198612 1198613 For 119861

1and 119861

2 the rules presented in Figure 7 are also

applicable for Figure 11 For1198613 we only consider root overlap

because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the

scenario 1198783in Figure 8 Therefore we only need to consider

1198613when 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0

322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ For a scenario 119878

119894(0 le 119894 le 10) in Figure 11 we

first decompose 119878119894to one or multiple building blocks For a

scenario 119878119894isin 1198781 1198783 it has only one building block and

all acceptable cases can be obtained directly For 1198782= 1199061=

1198611 1199062= 1198611 there is no need to consider the conflict between

the edges in 1199061and 119906

2because 119906

1and 119906

2are disconnected

Let 119877119894denote all acceptable cases of the path-pairs in 119906

119894 and

let 119879119894denote all acceptable cases for 119878

119894 Therefore we obtain

1198792= 1198771times1198772where times denotes the Cartesian product operator

from relational algebra

10 Computational and Mathematical Methods in Medicine

For B3 all three edges belong to root overlap (ie having root 3-overlap)

PAa

PAb PAcPAb

PAa

C(PAa PAb PAc) ne

B1 B2 B3

Tri 0

Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)

119878119894

1198784

1198785

1198787

1198788

1198789

11987810

119878119895

1198783

1198783

1198786

1198785

1198787

1198789

For 1198786= 1199061= 1198613 we obtain 119879

6= 1198771 For 119878

119894isin 119878119894| 4 le

119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based

on which we construct 119879119894

Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le

10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878

119895 is

defined as follows

(1) 119878119895is a proper subgraph of 119878

119894

(2) if 119878119894contains 119861

3 then 119878

119895must also contain 119861

3

(3) no such 119878119896exists that 119878

119895is a proper subgraph of 119878

119896

while 119878119896is also a proper subgraph of 119878

119894

For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the

largest subgraph of 119878119894 denoted as 119878

119895 in Table 2

For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878

119894 119878119895)

denote the set of building blocks in 119878119894but not in 119878

119895 where 119878

119895is

the largest subgraph of 119878119894 Let |119864

119894| and |119864

119895| denote the number

of edges in 119878119894and 119878

119895 respectively According to Table 2 we

can conclude that |119864119894| minus |119864

119895| = 1 In order to leverage the

dependency among building blocks we consider only 1198612in

Diff(119878119894119878119895) For example Diff(119878

51198783) = 119861

2 Let119879

3denote all

acceptable cases for 1198783 And let119877

1denote the set of acceptable

cases for Diff(1198785 1198783) Then we can use 119878

3and Diff(119878

5

1198783) to construct all acceptable cases for 119878

5 Then we apply

this idea for constructing all acceptable cases for each 119878119894in

Table 2Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case

has the following properties

(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path

(2) otherwise there can be at most two root 2-overlappaths

323 Path-Counting Formula forΦ119886119887119888119889

Now we present thepath-counting formula forΦ

119886119887119888119889as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

119871quad

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad+1

Φ119860119860119860

+ sum

Type 3(1

2)

119871quad+2

Φ119860119860)

(14)

where Φ119860119860= (12)(1+119865

119860)Φ119860119860119860

= (14)(1+3119865119860)Φ119860119860119860119860

=

(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-

common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904

ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path119875119860119905

ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap

path 119875119860119905

ending at 119904 and 119905respectively

119871quad =

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119904

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860119905for Case 2 isin Type 3

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus119871119875119860119905minus 119871119875119860119904

for Case 3 isin Type 3(15)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119887

119875119860119888 119875119860119889 etc)

For completeness the path-counting formulas for Φ119886119886119887119888

and Φ119886119886119886119887

are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C

Computational and Mathematical Methods in Medicine 11

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

s t

da

A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d

(a)

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

x y

da

A rarr x rarr a

A rarr x rarr d

A rarr y rarr bA rarr y rarr c

(b)

Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889

33 Path-Counting Formulas for Two Pairs of Individuals

331 Terminology and Definitions

(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875

119878119886isin 119875(119878 119886) 119875

119878119887isin

119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875

119879119889isin 119875(119879 119889) 119878 is a common ancestor

of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889

(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875

119860119886 119875119860119887) (or

119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when

119875119860119886

and 119875119860119887

(or 119875119860119888

and 119875119860119889) pass through the same parent of

119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call

119903 a heter-overlap individual when 119875119860119894

and 119875119860119895

pass throughthe same parent of 119903

(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap

individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path

Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875

119860119886

and 119875119860119887

119905 is a homo-overlap individual between 119875119860119888

and 119875119860119889 And

119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875

119860119886and 119875

119860119889 119910 is a heter-

overlap individual between 119875119860119887

and 119875119860119888 And 119860 rarr 119909 and

119860 rarr 119910 are root heter-overlap paths

332 Path-Counting Formula for Φ119886119887119888119889

Now we presenta path-pair level graphical representation for ⟨(119875

119860119886 119875119860119887)

(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can

be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩

presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are

summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875

119860ℎ(ie the path 119875

119860ℎending at ℎ) and 119903

1

and 1199032are the last individuals of root heter-overlap paths 119875

1198601199031

and 1198751198601199032

respectivelyGiven a pedigree graph having one or multiple progeni-

tors 119901119894| 119894 gt 0 we define that the generation of a progenitor

Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

Zero root 2-overlap andzero root 3-overlap

Zero root homo-overlap and zero rootheter-overlap

One root 2-overlap path

One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap

Two root 2-overlap paths

Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps

One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

One root 2-overlap andone root 3-overlap

One root homo-overlap and two rootheter-overlaps and 119903

1= 1199032= ℎ

One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

119901119894is 0 denoted as gen(119901

119894) = 0 If an individual 119886 has only

one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1

The path-counting formula forΦ119886119887119888119889

is as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

1198712-pair

Φ119860119860119860

+ sum

Type 2(1

2)

1198712-pair+1

Φ119860119860119860

+ sum

Type 3(1

2)

1198712-pair+2

Φ119860119860

+ sum

Type 4(1

2)

1198712-pair+1

Φ119860119860)

+ sum

(119878119879)isinType 5(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

⟩+1

Φ119861119861

(16)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =

119860) there are four types (ieType 1 to Type 4)

12 Computational and Mathematical Methods in Medicine

S0S1 S2 S3 S4 S5 S6 S7

S8 S9 S10 S11 S12 S13 S14 S15 S16

PAa

PAdPAb

PAc

Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level

Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875

119860119903ending at 119903

Type 3

zero root homo-overlap and two rootheter-overlap 119875

1198601199031and1198751198601199032

ending at1199031and 1199032 respectively

one root homo-overlap 119875119860ℎ

ending at ℎand two root heter-overlap 119875

1198601199031and 119875

1198601199032

ending at 1199031and 1199032 and 119903

1= 1199032

(17)

Type 4 one root homo-overlap 119875119860ℎ

ending at ℎ andtwo root heter-overlap ending at 119903

1and 1199032 and ℎ =

1199031= 1199032 For ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is

one type (ie Type 5)Type 5 ⟨119875

119878119886 119875119878119887⟩ has zero overlap individual ⟨119875

119879119888

119875119879119889⟩ has zero overlap individual

At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875

119879119888

119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875

119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩

there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879

119861=

119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)

and 119879 has two parents119879 otherwise

1198712-pair =

119871119875119860119886+ 119871119875119860119887

+119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119903

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 1198711198751198601199031

minus 1198711198751198601199032

for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860ℎfor Type 4

119871⟨119875119878119886 119875119878119887⟩

= 119871119875119878119886+ 119871119875119878119887

for Type 5

119871⟨119875119879119888 119875119879119889⟩

= 119871119875119879119888+ 119871119875119879119889

for Type 5

(18)

Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ

119886119887119888119889

Φ119886119887119888119889

= sum

(119878119879)isinType 6(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

Φ119878119878lowast Φ119879119879 (19)

Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875

119879119888

119875119879119889⟩ is a nonoverlapping path-pair Between a path from

⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩ there are no overlap

individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩

and 119871⟨119875119879119888119875119879119889⟩

are defined as in Type 5The correctness of the path-counting formula forΦ

119886119887119888119889is

proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ

119886119886119887119888 Φ119886119887119886119888

Φ119886119887119886119887

andΦ119886119886119886119887

34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method

Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows

(1) If 119906 is 119903 then NC(119903) contains only one element theempty string

(2) Otherwise let 119906 be a node with NC(119906) and V0 V1

V119896be 119906rsquos children in sibling order then for each 119909

in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le

119894 le 119896 and lowast indicates the gender of the individualrepresented by node V

119894

Computational and Mathematical Methods in Medicine 13

Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33

In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group

We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement

In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients

In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree

Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130

4 Conclusion

We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients

0

50

100

150

200

77 181

383

769

1558

3105

6174

1235

1

2466

7

4976

1

9832

8

1951

97

250

300

Aver

age t

ime (

ms)

Individuals in pedigree

RecursiveNodecodes

Figure 14 The effect of pedigree size on computation efficiencyimprovement

0200400600800

10001200140016001800

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Aver

age t

ime (

ms)

Depth

RecursiveNodeCodes

Figure 15 The effect of depth on computation efficiency improve-ment

we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees

Appendices

A Path-Counting Formulas of Special Cases

A1 Path-Counting Formula for Φ119886119886119887

For ⟨1198751198601198861 1198751198601198862⟩ we

introduce a special case where 1198751198601198861

and 1198751198601198862

aremergeable

14 Computational and Mathematical Methods in Medicine

PAa1 PAa2 PAa1 PAa2

S0 S1

PAb PAb PAb

If is mergeable⟨P ⟩Aa1 PAa2

PAa

S2 S3

Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩

Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861

1198751198601198862⟩ is mergeable if and only if the two paths 119875

1198601198861and 119875

1198601198862

are completely identical

Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩ in Figure 16

Lemma A2 For 1198782and 119878

3in Figure 16 ⟨119875

1198601198861 1198751198601198862⟩ cannot

be a mergeable path-pair

Proof For 1198782and 119878

3 if ⟨119875

1198601198861 1198751198601198862⟩ is mergeable then

any common individual 119904 between 1198751198601198861

and 119875119860119887

is alsoa shared individual between 119875

1198601198862and 119875

119860119887 It means

119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that

119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0

Considering all three scenarios in Figure 16 only 1198781can

have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now

we present our path-counting formula forΦ119886119886119887

where 119886 is notan ancestor of 119887

Φ119886119886119887

= sum

119860

( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860

+ sum

Type 3(1

2)

119871⟨119875119860119886119875119860119887⟩+1

Φ119860119860)

(A1)

where 119860 a common ancestor of 119886 and 119887When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap

Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at the individual 119904

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair

119871 triple = 1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

for Type 3

(A2)

For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ

119886119886119887in [10] but we can use either

the recursive formula for Φ119886119887119888

or the path-counting formulaforΦ119886119887119888

to computeΦ11988611198862119887

A2 Path-Counting Formula for Φ119886119886119887119888

Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable then

we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888

119875119860119889⟩ If ⟨119875

1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875

1198601198861 1198751198601198862

119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875

119860119886 119875119860119887 119875119860119888⟩

Now we present a path-counting formula forΦ119886119886119887119888

where119886 is not an ancestor of 119887 and 119888 as follows

Φ119886119886119887119888

= sum

119860

( sum

Type 1(1

2)

119871quadminus1

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad

ΦAAA

+ sum

Type 3(1

2)

119871quad+1

Φ119860119860)

+sum

119860

( sum

Type 4(1

2)

119871 triple+1

Φ119860119860119860

+ sum

Type 5(1

2)

119871 triple+2

Φ119860119860)

(A3)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875

119860119904ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

and 1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path 119875119860119905

ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904

and 119875119860119905

ending at 119904 and 119905respectively

(A4)

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path

Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875

119860119904

ending at 119904

119871quad=

1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119904

for Type 21198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905

for Case 2isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905minus 119871119875119860119904

for Case 3isinType 3

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 5(A5)

Computational and Mathematical Methods in Medicine 15

Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ

119886119887119888119889is applicable

to computeΦ11988611198862119887119888

A3 Path-Counting Formula for Φ119886119886119886119887

A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced

when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of

a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-

densed to ⟨119875119860119886 119875119860119887⟩

Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861

1198751198601198862

and 1198751198601198863

they are mergeable if and only if theyare completely identical

Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there

must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩

⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩

Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one

mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and

⟨1198751198601198862 1198751198601198863⟩

For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable

path-pairNow we present the path-counting formula for Φ

119886119886119886119887

where 119886 is not an ancestor of 119887 as follows

Φ119886119886119886119887

= sum

119860

(3

2( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860)

+ sum

Type 3(1

2)

119871pair+2

Φ119860119860)

(A6)

where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-

sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)

Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path

Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at 119904

When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping

119871 triple = 1198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3

(A7)

Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887

=

Φ119886111988621198863119887

Then we apply the path-counting formula forΦ119886119887119888119889

to computeΦ119886111988621198863119887

Case21 Case31 ΦAAAΦabCase22 Case32

Case23 ΦAA

Figure 17 Dependency graph for different cases regardingΦ119886119887119888

andΦ119886119886119887

B Proof for Path-Counting Formulas ofThree Individuals

Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ

119886119887119888is equivalent to the

computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors

B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ

119886119887119888andΦ

119886119886119887119866 can

have 5 different cases

Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 23 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

having mergeablepath-pair⟨119875

1198601198861 1198751198601198862⟩

lArr997904 Φ119886119886119887

Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlapCase 32 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlap

lArr997904 Φ119886119887119888

(B1)

Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals

Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)

(i) forΦ119886119887 the correctness of the path-counting formula

(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32

(ii) for Case 23 it has no cycle but only depends on Φ119886119887

Thus we prove the correctness of Case 23 by trans-forming the case toΦ

119886119887

16 Computational and Mathematical Methods in Medicine

a b

c

(a)

A

a b c

(b)

Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another

Parent-child relationshipAncestor-descendant relationship

A

a

s v p

f b c

(a)

Parent-child relationshipAncestor-descendant relationship

c

a

s v

f b

(b)

Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887

(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866

B11 Correctness Proof for Case 31

Case 31 ForΦ119886119887119888

119866 does not have any path triples ⟨119875119860119886 119875119860119887

119875119860119888⟩ with root overlap

Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888

Using the recursive formula (3) to compute Φ119886119887119888

forFigure 18(a) Φ

119886119887119888= (12)Φ

119888119887119888= (12)

2

Φ119888119888119888 for Figure 18(b)

Φ119886119887119888= (12)Φ

119860119887119888= (12)

2

Φ119860119860119888

= (12)3

Φ119860119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the

contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

follows sumType 1(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

where 119871⟨119875119860119886119875119860119887 119875119860119888⟩

=

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

For Figure 18(a) 119888 is the only triple-common ancestor

and we obtain Φ119886119887119888

= (12)119871⟨119875119888119886119875119888119887

119875119888119888⟩Φ119888119888119888

= (12)2

Φ119888119888119888 for

Figure 18(b) we obtain Φ119886119887119888

= (12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

=

(12)3

Φ119860119860119860

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1

For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866

then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888

119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ

119891119887119888= sumType 1(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860

Based on the recursive formula (3)Φ

119886119887119888= (12)(Φ

119891119887119888+Φ119898119887119888)

where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ

119898119887119888= 0 Then we can plug-in the path-

counting formula forΦ119891119887119888

to obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2lowast sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩

Φ119860119860119860

= sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860119860

∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 1(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩

Φ119860119860119860

(B2)

Similarly for Figure 19(b) we obtain Φ119886119887119888

=

sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1

Φ119888119888119888= sumType 1(12)

119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888

Thus it is true for 119899 = 119896 + 1

B12 Correctness Proof for Case 32

Case 32 ForΦ119886119887119888

119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root

overlap

Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888

Computational and Mathematical Methods in Medicine 17

a

b

c

(a)

A

a

b c

(b)

A

a

s

b

c

(c)

Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another

Using the recursive formula (3) to compute Φ119886119887119888

inFigure 20 for Figure 20(a) Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)3

Φ119888119888 for Figure 20(b)Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)4

Φ119860119860

for Figure 20(c)Φ119886119887119888= (12)

2

Φ119904119904119888= (12)

3

Φ119904119888=

(12)5

Φ119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-

tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

followssumType 2(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩+1

Φ119860119860

where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

=

119871119875119860119886

+ 119871119875119860119887

+ 119871119875119860119888minus 119871119875119860119904

and 119904 is the last individual of theroot overlap path 119875

119860119904

For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ

119886119887119888= (12)

119871⟨119875119888119886119875119888119887119875119888119888⟩+1

Φ119888119888= (12)

2+1

Φ119888119888=

(12)3

Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain

Φ119886119887119888= (12)

4

Φ119860119860

and Φ119886119887119888= (12)

5

Φ119860119860

respectively

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1

For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ

119891119887119888in 119866lowast Φ

119891119887119888= sumType 2(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

In 119866 119891 is the only parent of 119886 according to the recursive

formula (3) we have Φ119886119887119888= (12)Φ

119891119887119888 Then we can plug-in

the Φ119891119887119888

and obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩+1

Φ119860119860

(B3)

For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ

119886119887119888for Figure 21(a)

In summary it is true for 119899 = 119896 + 1

A

a

s

t

f

b

c

(a)

a

t

b

A

s c

(b)

a

s

t

b

c

(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887

B13 Correctness Proof for Case 23

Case 23 For Φ119886119886119887

the path-triples in the pedigree graph 119866have mergeable path-pair

Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875

1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means

that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ

119886119886119887can be computed as follows

sumType 3(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

where 119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

Using the recursive formula (4) we obtain Φ

119886119886119887=

(12)(Φ119886119887+ Φ119891119898119887)

For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891

there4 Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

=1

2(Φ119886119887+ 0) =

1

2Φ119886119887

(as 119898 is missing) (B4)

For Φ119886119887 we use Wrightrsquos formula and obtain Φ

119886119887=

sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860

where 119875 denotes all nonoverlappingpath-pairs ⟨119875

119860119886 119875119860119887⟩

Then we have Φ119886119886119887

= (12)Φ119886119887

=

(12)sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

For Figure 22(b) we can also transform the computation

of Φ119886119886119887

to Φ119886119887

In summary it shows that the path-counting formula(A1) is true for Case 23

B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887

whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ

119886119886119887can be

transformed toΦ11988611198862119887

which is equivalent to the computationof Φ119886119887119888

for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ

119886119886119887when the path-triple belongs

to either Case 21 or Case 22

B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 9: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

Computational and Mathematical Methods in Medicine 9

Ancestor-descendant relationshipParent-child relationship

a998400 b

a b a b

998400 a998400 b998400

s1 s2

A A

x w c x w c

s For Gk+1 ⟨P ⟩ = PAa PAb PAc

⟨P ⟩ = PAa PAb PAcFor Gk

Gk+1 k + 1 crossover Gk k crossover

A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b

A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b

A rarr c

A rarr c

Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866

119896having 119896 crossover

S0

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

PAa PAd

PAb PAc

Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Proof (Sketch) The proof is by induction on the number ofcrossover individuals

Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866

In the induction step let119866119896+1

be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875

119860119886and

119875119860119887

in 119866119896+1

We apply the splitting operator on 119904 in 119866119896+1

andobtain 119866

119896having 119896 crossovers by Lemma 4

315 Path-Counting Formula for Φ119886119887119888

Now we present thepath-counting formula forΦ

119886119887119888

Φ119886119887119888= sum

119860

( sum

Type 1(1

2)

119871 triple

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple+1

Φ119860119860)

(12)

where Φ119860119860= (12)(1 + 119865

119860) Φ119860119860119860

= (14)(1 + 3119865119860) 119865119860 the

inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875

119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type

2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875

119860119904ending at

the individual 119904

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 2(13)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119886

119875119860119888 and 119875

119860119904)

For completeness the path-counting formula for Φ119886119886119887

isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B

32 Path-Counting Formulas for Four Individuals

321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and

119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11

scenarios 1198780ndash11987810shown in Figure 10 where all four paths are

considered symmetricallyIn Figure 11 we introduce three building blocks 119861

1

1198612 1198613 For 119861

1and 119861

2 the rules presented in Figure 7 are also

applicable for Figure 11 For1198613 we only consider root overlap

because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the

scenario 1198783in Figure 8 Therefore we only need to consider

1198613when 119879119903119894 119862(119875

119860119886 119875119860119887 119875119860119888) = 0

322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887

119875119860119888119875119860119889⟩ For a scenario 119878

119894(0 le 119894 le 10) in Figure 11 we

first decompose 119878119894to one or multiple building blocks For a

scenario 119878119894isin 1198781 1198783 it has only one building block and

all acceptable cases can be obtained directly For 1198782= 1199061=

1198611 1199062= 1198611 there is no need to consider the conflict between

the edges in 1199061and 119906

2because 119906

1and 119906

2are disconnected

Let 119877119894denote all acceptable cases of the path-pairs in 119906

119894 and

let 119879119894denote all acceptable cases for 119878

119894 Therefore we obtain

1198792= 1198771times1198772where times denotes the Cartesian product operator

from relational algebra

10 Computational and Mathematical Methods in Medicine

For B3 all three edges belong to root overlap (ie having root 3-overlap)

PAa

PAb PAcPAb

PAa

C(PAa PAb PAc) ne

B1 B2 B3

Tri 0

Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)

119878119894

1198784

1198785

1198787

1198788

1198789

11987810

119878119895

1198783

1198783

1198786

1198785

1198787

1198789

For 1198786= 1199061= 1198613 we obtain 119879

6= 1198771 For 119878

119894isin 119878119894| 4 le

119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based

on which we construct 119879119894

Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le

10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878

119895 is

defined as follows

(1) 119878119895is a proper subgraph of 119878

119894

(2) if 119878119894contains 119861

3 then 119878

119895must also contain 119861

3

(3) no such 119878119896exists that 119878

119895is a proper subgraph of 119878

119896

while 119878119896is also a proper subgraph of 119878

119894

For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the

largest subgraph of 119878119894 denoted as 119878

119895 in Table 2

For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878

119894 119878119895)

denote the set of building blocks in 119878119894but not in 119878

119895 where 119878

119895is

the largest subgraph of 119878119894 Let |119864

119894| and |119864

119895| denote the number

of edges in 119878119894and 119878

119895 respectively According to Table 2 we

can conclude that |119864119894| minus |119864

119895| = 1 In order to leverage the

dependency among building blocks we consider only 1198612in

Diff(119878119894119878119895) For example Diff(119878

51198783) = 119861

2 Let119879

3denote all

acceptable cases for 1198783 And let119877

1denote the set of acceptable

cases for Diff(1198785 1198783) Then we can use 119878

3and Diff(119878

5

1198783) to construct all acceptable cases for 119878

5 Then we apply

this idea for constructing all acceptable cases for each 119878119894in

Table 2Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case

has the following properties

(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path

(2) otherwise there can be at most two root 2-overlappaths

323 Path-Counting Formula forΦ119886119887119888119889

Now we present thepath-counting formula forΦ

119886119887119888119889as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

119871quad

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad+1

Φ119860119860119860

+ sum

Type 3(1

2)

119871quad+2

Φ119860119860)

(14)

where Φ119860119860= (12)(1+119865

119860)Φ119860119860119860

= (14)(1+3119865119860)Φ119860119860119860119860

=

(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-

common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904

ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path119875119860119905

ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap

path 119875119860119905

ending at 119904 and 119905respectively

119871quad =

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119904

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860119905for Case 2 isin Type 3

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus119871119875119860119905minus 119871119875119860119904

for Case 3 isin Type 3(15)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119887

119875119860119888 119875119860119889 etc)

For completeness the path-counting formulas for Φ119886119886119887119888

and Φ119886119886119886119887

are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C

Computational and Mathematical Methods in Medicine 11

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

s t

da

A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d

(a)

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

x y

da

A rarr x rarr a

A rarr x rarr d

A rarr y rarr bA rarr y rarr c

(b)

Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889

33 Path-Counting Formulas for Two Pairs of Individuals

331 Terminology and Definitions

(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875

119878119886isin 119875(119878 119886) 119875

119878119887isin

119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875

119879119889isin 119875(119879 119889) 119878 is a common ancestor

of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889

(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875

119860119886 119875119860119887) (or

119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when

119875119860119886

and 119875119860119887

(or 119875119860119888

and 119875119860119889) pass through the same parent of

119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call

119903 a heter-overlap individual when 119875119860119894

and 119875119860119895

pass throughthe same parent of 119903

(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap

individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path

Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875

119860119886

and 119875119860119887

119905 is a homo-overlap individual between 119875119860119888

and 119875119860119889 And

119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875

119860119886and 119875

119860119889 119910 is a heter-

overlap individual between 119875119860119887

and 119875119860119888 And 119860 rarr 119909 and

119860 rarr 119910 are root heter-overlap paths

332 Path-Counting Formula for Φ119886119887119888119889

Now we presenta path-pair level graphical representation for ⟨(119875

119860119886 119875119860119887)

(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can

be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩

presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are

summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875

119860ℎ(ie the path 119875

119860ℎending at ℎ) and 119903

1

and 1199032are the last individuals of root heter-overlap paths 119875

1198601199031

and 1198751198601199032

respectivelyGiven a pedigree graph having one or multiple progeni-

tors 119901119894| 119894 gt 0 we define that the generation of a progenitor

Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

Zero root 2-overlap andzero root 3-overlap

Zero root homo-overlap and zero rootheter-overlap

One root 2-overlap path

One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap

Two root 2-overlap paths

Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps

One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

One root 2-overlap andone root 3-overlap

One root homo-overlap and two rootheter-overlaps and 119903

1= 1199032= ℎ

One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

119901119894is 0 denoted as gen(119901

119894) = 0 If an individual 119886 has only

one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1

The path-counting formula forΦ119886119887119888119889

is as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

1198712-pair

Φ119860119860119860

+ sum

Type 2(1

2)

1198712-pair+1

Φ119860119860119860

+ sum

Type 3(1

2)

1198712-pair+2

Φ119860119860

+ sum

Type 4(1

2)

1198712-pair+1

Φ119860119860)

+ sum

(119878119879)isinType 5(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

⟩+1

Φ119861119861

(16)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =

119860) there are four types (ieType 1 to Type 4)

12 Computational and Mathematical Methods in Medicine

S0S1 S2 S3 S4 S5 S6 S7

S8 S9 S10 S11 S12 S13 S14 S15 S16

PAa

PAdPAb

PAc

Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level

Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875

119860119903ending at 119903

Type 3

zero root homo-overlap and two rootheter-overlap 119875

1198601199031and1198751198601199032

ending at1199031and 1199032 respectively

one root homo-overlap 119875119860ℎ

ending at ℎand two root heter-overlap 119875

1198601199031and 119875

1198601199032

ending at 1199031and 1199032 and 119903

1= 1199032

(17)

Type 4 one root homo-overlap 119875119860ℎ

ending at ℎ andtwo root heter-overlap ending at 119903

1and 1199032 and ℎ =

1199031= 1199032 For ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is

one type (ie Type 5)Type 5 ⟨119875

119878119886 119875119878119887⟩ has zero overlap individual ⟨119875

119879119888

119875119879119889⟩ has zero overlap individual

At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875

119879119888

119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875

119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩

there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879

119861=

119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)

and 119879 has two parents119879 otherwise

1198712-pair =

119871119875119860119886+ 119871119875119860119887

+119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119903

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 1198711198751198601199031

minus 1198711198751198601199032

for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860ℎfor Type 4

119871⟨119875119878119886 119875119878119887⟩

= 119871119875119878119886+ 119871119875119878119887

for Type 5

119871⟨119875119879119888 119875119879119889⟩

= 119871119875119879119888+ 119871119875119879119889

for Type 5

(18)

Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ

119886119887119888119889

Φ119886119887119888119889

= sum

(119878119879)isinType 6(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

Φ119878119878lowast Φ119879119879 (19)

Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875

119879119888

119875119879119889⟩ is a nonoverlapping path-pair Between a path from

⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩ there are no overlap

individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩

and 119871⟨119875119879119888119875119879119889⟩

are defined as in Type 5The correctness of the path-counting formula forΦ

119886119887119888119889is

proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ

119886119886119887119888 Φ119886119887119886119888

Φ119886119887119886119887

andΦ119886119886119886119887

34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method

Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows

(1) If 119906 is 119903 then NC(119903) contains only one element theempty string

(2) Otherwise let 119906 be a node with NC(119906) and V0 V1

V119896be 119906rsquos children in sibling order then for each 119909

in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le

119894 le 119896 and lowast indicates the gender of the individualrepresented by node V

119894

Computational and Mathematical Methods in Medicine 13

Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33

In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group

We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement

In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients

In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree

Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130

4 Conclusion

We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients

0

50

100

150

200

77 181

383

769

1558

3105

6174

1235

1

2466

7

4976

1

9832

8

1951

97

250

300

Aver

age t

ime (

ms)

Individuals in pedigree

RecursiveNodecodes

Figure 14 The effect of pedigree size on computation efficiencyimprovement

0200400600800

10001200140016001800

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Aver

age t

ime (

ms)

Depth

RecursiveNodeCodes

Figure 15 The effect of depth on computation efficiency improve-ment

we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees

Appendices

A Path-Counting Formulas of Special Cases

A1 Path-Counting Formula for Φ119886119886119887

For ⟨1198751198601198861 1198751198601198862⟩ we

introduce a special case where 1198751198601198861

and 1198751198601198862

aremergeable

14 Computational and Mathematical Methods in Medicine

PAa1 PAa2 PAa1 PAa2

S0 S1

PAb PAb PAb

If is mergeable⟨P ⟩Aa1 PAa2

PAa

S2 S3

Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩

Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861

1198751198601198862⟩ is mergeable if and only if the two paths 119875

1198601198861and 119875

1198601198862

are completely identical

Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩ in Figure 16

Lemma A2 For 1198782and 119878

3in Figure 16 ⟨119875

1198601198861 1198751198601198862⟩ cannot

be a mergeable path-pair

Proof For 1198782and 119878

3 if ⟨119875

1198601198861 1198751198601198862⟩ is mergeable then

any common individual 119904 between 1198751198601198861

and 119875119860119887

is alsoa shared individual between 119875

1198601198862and 119875

119860119887 It means

119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that

119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0

Considering all three scenarios in Figure 16 only 1198781can

have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now

we present our path-counting formula forΦ119886119886119887

where 119886 is notan ancestor of 119887

Φ119886119886119887

= sum

119860

( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860

+ sum

Type 3(1

2)

119871⟨119875119860119886119875119860119887⟩+1

Φ119860119860)

(A1)

where 119860 a common ancestor of 119886 and 119887When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap

Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at the individual 119904

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair

119871 triple = 1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

for Type 3

(A2)

For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ

119886119886119887in [10] but we can use either

the recursive formula for Φ119886119887119888

or the path-counting formulaforΦ119886119887119888

to computeΦ11988611198862119887

A2 Path-Counting Formula for Φ119886119886119887119888

Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable then

we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888

119875119860119889⟩ If ⟨119875

1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875

1198601198861 1198751198601198862

119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875

119860119886 119875119860119887 119875119860119888⟩

Now we present a path-counting formula forΦ119886119886119887119888

where119886 is not an ancestor of 119887 and 119888 as follows

Φ119886119886119887119888

= sum

119860

( sum

Type 1(1

2)

119871quadminus1

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad

ΦAAA

+ sum

Type 3(1

2)

119871quad+1

Φ119860119860)

+sum

119860

( sum

Type 4(1

2)

119871 triple+1

Φ119860119860119860

+ sum

Type 5(1

2)

119871 triple+2

Φ119860119860)

(A3)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875

119860119904ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

and 1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path 119875119860119905

ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904

and 119875119860119905

ending at 119904 and 119905respectively

(A4)

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path

Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875

119860119904

ending at 119904

119871quad=

1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119904

for Type 21198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905

for Case 2isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905minus 119871119875119860119904

for Case 3isinType 3

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 5(A5)

Computational and Mathematical Methods in Medicine 15

Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ

119886119887119888119889is applicable

to computeΦ11988611198862119887119888

A3 Path-Counting Formula for Φ119886119886119886119887

A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced

when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of

a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-

densed to ⟨119875119860119886 119875119860119887⟩

Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861

1198751198601198862

and 1198751198601198863

they are mergeable if and only if theyare completely identical

Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there

must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩

⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩

Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one

mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and

⟨1198751198601198862 1198751198601198863⟩

For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable

path-pairNow we present the path-counting formula for Φ

119886119886119886119887

where 119886 is not an ancestor of 119887 as follows

Φ119886119886119886119887

= sum

119860

(3

2( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860)

+ sum

Type 3(1

2)

119871pair+2

Φ119860119860)

(A6)

where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-

sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)

Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path

Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at 119904

When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping

119871 triple = 1198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3

(A7)

Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887

=

Φ119886111988621198863119887

Then we apply the path-counting formula forΦ119886119887119888119889

to computeΦ119886111988621198863119887

Case21 Case31 ΦAAAΦabCase22 Case32

Case23 ΦAA

Figure 17 Dependency graph for different cases regardingΦ119886119887119888

andΦ119886119886119887

B Proof for Path-Counting Formulas ofThree Individuals

Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ

119886119887119888is equivalent to the

computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors

B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ

119886119887119888andΦ

119886119886119887119866 can

have 5 different cases

Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 23 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

having mergeablepath-pair⟨119875

1198601198861 1198751198601198862⟩

lArr997904 Φ119886119886119887

Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlapCase 32 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlap

lArr997904 Φ119886119887119888

(B1)

Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals

Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)

(i) forΦ119886119887 the correctness of the path-counting formula

(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32

(ii) for Case 23 it has no cycle but only depends on Φ119886119887

Thus we prove the correctness of Case 23 by trans-forming the case toΦ

119886119887

16 Computational and Mathematical Methods in Medicine

a b

c

(a)

A

a b c

(b)

Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another

Parent-child relationshipAncestor-descendant relationship

A

a

s v p

f b c

(a)

Parent-child relationshipAncestor-descendant relationship

c

a

s v

f b

(b)

Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887

(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866

B11 Correctness Proof for Case 31

Case 31 ForΦ119886119887119888

119866 does not have any path triples ⟨119875119860119886 119875119860119887

119875119860119888⟩ with root overlap

Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888

Using the recursive formula (3) to compute Φ119886119887119888

forFigure 18(a) Φ

119886119887119888= (12)Φ

119888119887119888= (12)

2

Φ119888119888119888 for Figure 18(b)

Φ119886119887119888= (12)Φ

119860119887119888= (12)

2

Φ119860119860119888

= (12)3

Φ119860119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the

contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

follows sumType 1(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

where 119871⟨119875119860119886119875119860119887 119875119860119888⟩

=

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

For Figure 18(a) 119888 is the only triple-common ancestor

and we obtain Φ119886119887119888

= (12)119871⟨119875119888119886119875119888119887

119875119888119888⟩Φ119888119888119888

= (12)2

Φ119888119888119888 for

Figure 18(b) we obtain Φ119886119887119888

= (12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

=

(12)3

Φ119860119860119860

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1

For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866

then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888

119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ

119891119887119888= sumType 1(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860

Based on the recursive formula (3)Φ

119886119887119888= (12)(Φ

119891119887119888+Φ119898119887119888)

where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ

119898119887119888= 0 Then we can plug-in the path-

counting formula forΦ119891119887119888

to obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2lowast sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩

Φ119860119860119860

= sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860119860

∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 1(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩

Φ119860119860119860

(B2)

Similarly for Figure 19(b) we obtain Φ119886119887119888

=

sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1

Φ119888119888119888= sumType 1(12)

119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888

Thus it is true for 119899 = 119896 + 1

B12 Correctness Proof for Case 32

Case 32 ForΦ119886119887119888

119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root

overlap

Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888

Computational and Mathematical Methods in Medicine 17

a

b

c

(a)

A

a

b c

(b)

A

a

s

b

c

(c)

Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another

Using the recursive formula (3) to compute Φ119886119887119888

inFigure 20 for Figure 20(a) Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)3

Φ119888119888 for Figure 20(b)Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)4

Φ119860119860

for Figure 20(c)Φ119886119887119888= (12)

2

Φ119904119904119888= (12)

3

Φ119904119888=

(12)5

Φ119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-

tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

followssumType 2(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩+1

Φ119860119860

where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

=

119871119875119860119886

+ 119871119875119860119887

+ 119871119875119860119888minus 119871119875119860119904

and 119904 is the last individual of theroot overlap path 119875

119860119904

For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ

119886119887119888= (12)

119871⟨119875119888119886119875119888119887119875119888119888⟩+1

Φ119888119888= (12)

2+1

Φ119888119888=

(12)3

Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain

Φ119886119887119888= (12)

4

Φ119860119860

and Φ119886119887119888= (12)

5

Φ119860119860

respectively

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1

For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ

119891119887119888in 119866lowast Φ

119891119887119888= sumType 2(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

In 119866 119891 is the only parent of 119886 according to the recursive

formula (3) we have Φ119886119887119888= (12)Φ

119891119887119888 Then we can plug-in

the Φ119891119887119888

and obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩+1

Φ119860119860

(B3)

For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ

119886119887119888for Figure 21(a)

In summary it is true for 119899 = 119896 + 1

A

a

s

t

f

b

c

(a)

a

t

b

A

s c

(b)

a

s

t

b

c

(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887

B13 Correctness Proof for Case 23

Case 23 For Φ119886119886119887

the path-triples in the pedigree graph 119866have mergeable path-pair

Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875

1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means

that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ

119886119886119887can be computed as follows

sumType 3(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

where 119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

Using the recursive formula (4) we obtain Φ

119886119886119887=

(12)(Φ119886119887+ Φ119891119898119887)

For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891

there4 Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

=1

2(Φ119886119887+ 0) =

1

2Φ119886119887

(as 119898 is missing) (B4)

For Φ119886119887 we use Wrightrsquos formula and obtain Φ

119886119887=

sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860

where 119875 denotes all nonoverlappingpath-pairs ⟨119875

119860119886 119875119860119887⟩

Then we have Φ119886119886119887

= (12)Φ119886119887

=

(12)sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

For Figure 22(b) we can also transform the computation

of Φ119886119886119887

to Φ119886119887

In summary it shows that the path-counting formula(A1) is true for Case 23

B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887

whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ

119886119886119887can be

transformed toΦ11988611198862119887

which is equivalent to the computationof Φ119886119887119888

for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ

119886119886119887when the path-triple belongs

to either Case 21 or Case 22

B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 10: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

10 Computational and Mathematical Methods in Medicine

For B3 all three edges belong to root overlap (ie having root 3-overlap)

PAa

PAb PAcPAb

PAa

C(PAa PAb PAc) ne

B1 B2 B3

Tri 0

Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)

119878119894

1198784

1198785

1198787

1198788

1198789

11987810

119878119895

1198783

1198783

1198786

1198785

1198787

1198789

For 1198786= 1199061= 1198613 we obtain 119879

6= 1198771 For 119878

119894isin 119878119894| 4 le

119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based

on which we construct 119879119894

Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le

10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878

119895 is

defined as follows

(1) 119878119895is a proper subgraph of 119878

119894

(2) if 119878119894contains 119861

3 then 119878

119895must also contain 119861

3

(3) no such 119878119896exists that 119878

119895is a proper subgraph of 119878

119896

while 119878119896is also a proper subgraph of 119878

119894

For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the

largest subgraph of 119878119894 denoted as 119878

119895 in Table 2

For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878

119894 119878119895)

denote the set of building blocks in 119878119894but not in 119878

119895 where 119878

119895is

the largest subgraph of 119878119894 Let |119864

119894| and |119864

119895| denote the number

of edges in 119878119894and 119878

119895 respectively According to Table 2 we

can conclude that |119864119894| minus |119864

119895| = 1 In order to leverage the

dependency among building blocks we consider only 1198612in

Diff(119878119894119878119895) For example Diff(119878

51198783) = 119861

2 Let119879

3denote all

acceptable cases for 1198783 And let119877

1denote the set of acceptable

cases for Diff(1198785 1198783) Then we can use 119878

3and Diff(119878

5

1198783) to construct all acceptable cases for 119878

5 Then we apply

this idea for constructing all acceptable cases for each 119878119894in

Table 2Given a path-quad ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case

has the following properties

(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path

(2) otherwise there can be at most two root 2-overlappaths

323 Path-Counting Formula forΦ119886119887119888119889

Now we present thepath-counting formula forΦ

119886119887119888119889as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

119871quad

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad+1

Φ119860119860119860

+ sum

Type 3(1

2)

119871quad+2

Φ119860119860)

(14)

where Φ119860119860= (12)(1+119865

119860)Φ119860119860119860

= (14)(1+3119865119860)Φ119860119860119860119860

=

(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-

common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904

ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path119875119860119905

ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap

path 119875119860119905

ending at 119904 and 119905respectively

119871quad =

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119904

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860119905for Case 2 isin Type 3

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889

minus119871119875119860119905minus 119871119875119860119904

for Case 3 isin Type 3(15)

and 119871119875119860119886

the length of the path 119875119860119886

(also applicable for 119875119860119887

119875119860119888 119875119860119889 etc)

For completeness the path-counting formulas for Φ119886119886119887119888

and Φ119886119886119886119887

are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C

Computational and Mathematical Methods in Medicine 11

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

s t

da

A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d

(a)

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

x y

da

A rarr x rarr a

A rarr x rarr d

A rarr y rarr bA rarr y rarr c

(b)

Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889

33 Path-Counting Formulas for Two Pairs of Individuals

331 Terminology and Definitions

(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875

119878119886isin 119875(119878 119886) 119875

119878119887isin

119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875

119879119889isin 119875(119879 119889) 119878 is a common ancestor

of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889

(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875

119860119886 119875119860119887) (or

119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when

119875119860119886

and 119875119860119887

(or 119875119860119888

and 119875119860119889) pass through the same parent of

119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call

119903 a heter-overlap individual when 119875119860119894

and 119875119860119895

pass throughthe same parent of 119903

(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap

individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path

Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875

119860119886

and 119875119860119887

119905 is a homo-overlap individual between 119875119860119888

and 119875119860119889 And

119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875

119860119886and 119875

119860119889 119910 is a heter-

overlap individual between 119875119860119887

and 119875119860119888 And 119860 rarr 119909 and

119860 rarr 119910 are root heter-overlap paths

332 Path-Counting Formula for Φ119886119887119888119889

Now we presenta path-pair level graphical representation for ⟨(119875

119860119886 119875119860119887)

(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can

be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩

presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are

summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875

119860ℎ(ie the path 119875

119860ℎending at ℎ) and 119903

1

and 1199032are the last individuals of root heter-overlap paths 119875

1198601199031

and 1198751198601199032

respectivelyGiven a pedigree graph having one or multiple progeni-

tors 119901119894| 119894 gt 0 we define that the generation of a progenitor

Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

Zero root 2-overlap andzero root 3-overlap

Zero root homo-overlap and zero rootheter-overlap

One root 2-overlap path

One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap

Two root 2-overlap paths

Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps

One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

One root 2-overlap andone root 3-overlap

One root homo-overlap and two rootheter-overlaps and 119903

1= 1199032= ℎ

One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

119901119894is 0 denoted as gen(119901

119894) = 0 If an individual 119886 has only

one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1

The path-counting formula forΦ119886119887119888119889

is as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

1198712-pair

Φ119860119860119860

+ sum

Type 2(1

2)

1198712-pair+1

Φ119860119860119860

+ sum

Type 3(1

2)

1198712-pair+2

Φ119860119860

+ sum

Type 4(1

2)

1198712-pair+1

Φ119860119860)

+ sum

(119878119879)isinType 5(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

⟩+1

Φ119861119861

(16)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =

119860) there are four types (ieType 1 to Type 4)

12 Computational and Mathematical Methods in Medicine

S0S1 S2 S3 S4 S5 S6 S7

S8 S9 S10 S11 S12 S13 S14 S15 S16

PAa

PAdPAb

PAc

Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level

Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875

119860119903ending at 119903

Type 3

zero root homo-overlap and two rootheter-overlap 119875

1198601199031and1198751198601199032

ending at1199031and 1199032 respectively

one root homo-overlap 119875119860ℎ

ending at ℎand two root heter-overlap 119875

1198601199031and 119875

1198601199032

ending at 1199031and 1199032 and 119903

1= 1199032

(17)

Type 4 one root homo-overlap 119875119860ℎ

ending at ℎ andtwo root heter-overlap ending at 119903

1and 1199032 and ℎ =

1199031= 1199032 For ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is

one type (ie Type 5)Type 5 ⟨119875

119878119886 119875119878119887⟩ has zero overlap individual ⟨119875

119879119888

119875119879119889⟩ has zero overlap individual

At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875

119879119888

119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875

119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩

there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879

119861=

119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)

and 119879 has two parents119879 otherwise

1198712-pair =

119871119875119860119886+ 119871119875119860119887

+119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119903

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 1198711198751198601199031

minus 1198711198751198601199032

for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860ℎfor Type 4

119871⟨119875119878119886 119875119878119887⟩

= 119871119875119878119886+ 119871119875119878119887

for Type 5

119871⟨119875119879119888 119875119879119889⟩

= 119871119875119879119888+ 119871119875119879119889

for Type 5

(18)

Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ

119886119887119888119889

Φ119886119887119888119889

= sum

(119878119879)isinType 6(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

Φ119878119878lowast Φ119879119879 (19)

Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875

119879119888

119875119879119889⟩ is a nonoverlapping path-pair Between a path from

⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩ there are no overlap

individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩

and 119871⟨119875119879119888119875119879119889⟩

are defined as in Type 5The correctness of the path-counting formula forΦ

119886119887119888119889is

proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ

119886119886119887119888 Φ119886119887119886119888

Φ119886119887119886119887

andΦ119886119886119886119887

34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method

Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows

(1) If 119906 is 119903 then NC(119903) contains only one element theempty string

(2) Otherwise let 119906 be a node with NC(119906) and V0 V1

V119896be 119906rsquos children in sibling order then for each 119909

in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le

119894 le 119896 and lowast indicates the gender of the individualrepresented by node V

119894

Computational and Mathematical Methods in Medicine 13

Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33

In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group

We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement

In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients

In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree

Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130

4 Conclusion

We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients

0

50

100

150

200

77 181

383

769

1558

3105

6174

1235

1

2466

7

4976

1

9832

8

1951

97

250

300

Aver

age t

ime (

ms)

Individuals in pedigree

RecursiveNodecodes

Figure 14 The effect of pedigree size on computation efficiencyimprovement

0200400600800

10001200140016001800

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Aver

age t

ime (

ms)

Depth

RecursiveNodeCodes

Figure 15 The effect of depth on computation efficiency improve-ment

we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees

Appendices

A Path-Counting Formulas of Special Cases

A1 Path-Counting Formula for Φ119886119886119887

For ⟨1198751198601198861 1198751198601198862⟩ we

introduce a special case where 1198751198601198861

and 1198751198601198862

aremergeable

14 Computational and Mathematical Methods in Medicine

PAa1 PAa2 PAa1 PAa2

S0 S1

PAb PAb PAb

If is mergeable⟨P ⟩Aa1 PAa2

PAa

S2 S3

Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩

Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861

1198751198601198862⟩ is mergeable if and only if the two paths 119875

1198601198861and 119875

1198601198862

are completely identical

Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩ in Figure 16

Lemma A2 For 1198782and 119878

3in Figure 16 ⟨119875

1198601198861 1198751198601198862⟩ cannot

be a mergeable path-pair

Proof For 1198782and 119878

3 if ⟨119875

1198601198861 1198751198601198862⟩ is mergeable then

any common individual 119904 between 1198751198601198861

and 119875119860119887

is alsoa shared individual between 119875

1198601198862and 119875

119860119887 It means

119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that

119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0

Considering all three scenarios in Figure 16 only 1198781can

have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now

we present our path-counting formula forΦ119886119886119887

where 119886 is notan ancestor of 119887

Φ119886119886119887

= sum

119860

( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860

+ sum

Type 3(1

2)

119871⟨119875119860119886119875119860119887⟩+1

Φ119860119860)

(A1)

where 119860 a common ancestor of 119886 and 119887When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap

Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at the individual 119904

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair

119871 triple = 1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

for Type 3

(A2)

For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ

119886119886119887in [10] but we can use either

the recursive formula for Φ119886119887119888

or the path-counting formulaforΦ119886119887119888

to computeΦ11988611198862119887

A2 Path-Counting Formula for Φ119886119886119887119888

Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable then

we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888

119875119860119889⟩ If ⟨119875

1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875

1198601198861 1198751198601198862

119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875

119860119886 119875119860119887 119875119860119888⟩

Now we present a path-counting formula forΦ119886119886119887119888

where119886 is not an ancestor of 119887 and 119888 as follows

Φ119886119886119887119888

= sum

119860

( sum

Type 1(1

2)

119871quadminus1

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad

ΦAAA

+ sum

Type 3(1

2)

119871quad+1

Φ119860119860)

+sum

119860

( sum

Type 4(1

2)

119871 triple+1

Φ119860119860119860

+ sum

Type 5(1

2)

119871 triple+2

Φ119860119860)

(A3)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875

119860119904ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

and 1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path 119875119860119905

ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904

and 119875119860119905

ending at 119904 and 119905respectively

(A4)

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path

Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875

119860119904

ending at 119904

119871quad=

1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119904

for Type 21198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905

for Case 2isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905minus 119871119875119860119904

for Case 3isinType 3

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 5(A5)

Computational and Mathematical Methods in Medicine 15

Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ

119886119887119888119889is applicable

to computeΦ11988611198862119887119888

A3 Path-Counting Formula for Φ119886119886119886119887

A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced

when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of

a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-

densed to ⟨119875119860119886 119875119860119887⟩

Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861

1198751198601198862

and 1198751198601198863

they are mergeable if and only if theyare completely identical

Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there

must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩

⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩

Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one

mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and

⟨1198751198601198862 1198751198601198863⟩

For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable

path-pairNow we present the path-counting formula for Φ

119886119886119886119887

where 119886 is not an ancestor of 119887 as follows

Φ119886119886119886119887

= sum

119860

(3

2( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860)

+ sum

Type 3(1

2)

119871pair+2

Φ119860119860)

(A6)

where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-

sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)

Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path

Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at 119904

When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping

119871 triple = 1198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3

(A7)

Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887

=

Φ119886111988621198863119887

Then we apply the path-counting formula forΦ119886119887119888119889

to computeΦ119886111988621198863119887

Case21 Case31 ΦAAAΦabCase22 Case32

Case23 ΦAA

Figure 17 Dependency graph for different cases regardingΦ119886119887119888

andΦ119886119886119887

B Proof for Path-Counting Formulas ofThree Individuals

Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ

119886119887119888is equivalent to the

computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors

B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ

119886119887119888andΦ

119886119886119887119866 can

have 5 different cases

Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 23 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

having mergeablepath-pair⟨119875

1198601198861 1198751198601198862⟩

lArr997904 Φ119886119886119887

Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlapCase 32 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlap

lArr997904 Φ119886119887119888

(B1)

Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals

Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)

(i) forΦ119886119887 the correctness of the path-counting formula

(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32

(ii) for Case 23 it has no cycle but only depends on Φ119886119887

Thus we prove the correctness of Case 23 by trans-forming the case toΦ

119886119887

16 Computational and Mathematical Methods in Medicine

a b

c

(a)

A

a b c

(b)

Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another

Parent-child relationshipAncestor-descendant relationship

A

a

s v p

f b c

(a)

Parent-child relationshipAncestor-descendant relationship

c

a

s v

f b

(b)

Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887

(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866

B11 Correctness Proof for Case 31

Case 31 ForΦ119886119887119888

119866 does not have any path triples ⟨119875119860119886 119875119860119887

119875119860119888⟩ with root overlap

Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888

Using the recursive formula (3) to compute Φ119886119887119888

forFigure 18(a) Φ

119886119887119888= (12)Φ

119888119887119888= (12)

2

Φ119888119888119888 for Figure 18(b)

Φ119886119887119888= (12)Φ

119860119887119888= (12)

2

Φ119860119860119888

= (12)3

Φ119860119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the

contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

follows sumType 1(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

where 119871⟨119875119860119886119875119860119887 119875119860119888⟩

=

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

For Figure 18(a) 119888 is the only triple-common ancestor

and we obtain Φ119886119887119888

= (12)119871⟨119875119888119886119875119888119887

119875119888119888⟩Φ119888119888119888

= (12)2

Φ119888119888119888 for

Figure 18(b) we obtain Φ119886119887119888

= (12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

=

(12)3

Φ119860119860119860

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1

For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866

then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888

119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ

119891119887119888= sumType 1(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860

Based on the recursive formula (3)Φ

119886119887119888= (12)(Φ

119891119887119888+Φ119898119887119888)

where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ

119898119887119888= 0 Then we can plug-in the path-

counting formula forΦ119891119887119888

to obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2lowast sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩

Φ119860119860119860

= sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860119860

∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 1(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩

Φ119860119860119860

(B2)

Similarly for Figure 19(b) we obtain Φ119886119887119888

=

sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1

Φ119888119888119888= sumType 1(12)

119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888

Thus it is true for 119899 = 119896 + 1

B12 Correctness Proof for Case 32

Case 32 ForΦ119886119887119888

119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root

overlap

Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888

Computational and Mathematical Methods in Medicine 17

a

b

c

(a)

A

a

b c

(b)

A

a

s

b

c

(c)

Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another

Using the recursive formula (3) to compute Φ119886119887119888

inFigure 20 for Figure 20(a) Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)3

Φ119888119888 for Figure 20(b)Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)4

Φ119860119860

for Figure 20(c)Φ119886119887119888= (12)

2

Φ119904119904119888= (12)

3

Φ119904119888=

(12)5

Φ119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-

tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

followssumType 2(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩+1

Φ119860119860

where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

=

119871119875119860119886

+ 119871119875119860119887

+ 119871119875119860119888minus 119871119875119860119904

and 119904 is the last individual of theroot overlap path 119875

119860119904

For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ

119886119887119888= (12)

119871⟨119875119888119886119875119888119887119875119888119888⟩+1

Φ119888119888= (12)

2+1

Φ119888119888=

(12)3

Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain

Φ119886119887119888= (12)

4

Φ119860119860

and Φ119886119887119888= (12)

5

Φ119860119860

respectively

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1

For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ

119891119887119888in 119866lowast Φ

119891119887119888= sumType 2(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

In 119866 119891 is the only parent of 119886 according to the recursive

formula (3) we have Φ119886119887119888= (12)Φ

119891119887119888 Then we can plug-in

the Φ119891119887119888

and obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩+1

Φ119860119860

(B3)

For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ

119886119887119888for Figure 21(a)

In summary it is true for 119899 = 119896 + 1

A

a

s

t

f

b

c

(a)

a

t

b

A

s c

(b)

a

s

t

b

c

(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887

B13 Correctness Proof for Case 23

Case 23 For Φ119886119886119887

the path-triples in the pedigree graph 119866have mergeable path-pair

Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875

1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means

that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ

119886119886119887can be computed as follows

sumType 3(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

where 119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

Using the recursive formula (4) we obtain Φ

119886119886119887=

(12)(Φ119886119887+ Φ119891119898119887)

For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891

there4 Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

=1

2(Φ119886119887+ 0) =

1

2Φ119886119887

(as 119898 is missing) (B4)

For Φ119886119887 we use Wrightrsquos formula and obtain Φ

119886119887=

sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860

where 119875 denotes all nonoverlappingpath-pairs ⟨119875

119860119886 119875119860119887⟩

Then we have Φ119886119886119887

= (12)Φ119886119887

=

(12)sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

For Figure 22(b) we can also transform the computation

of Φ119886119886119887

to Φ119886119887

In summary it shows that the path-counting formula(A1) is true for Case 23

B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887

whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ

119886119886119887can be

transformed toΦ11988611198862119887

which is equivalent to the computationof Φ119886119887119888

for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ

119886119886119887when the path-triple belongs

to either Case 21 or Case 22

B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 11: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

Computational and Mathematical Methods in Medicine 11

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

s t

da

A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d

(a)

⟨ ⟩(PAa PAb) (PAc PAd) = b

A

c

x y

da

A rarr x rarr a

A rarr x rarr d

A rarr y rarr bA rarr y rarr c

(b)

Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889

33 Path-Counting Formulas for Two Pairs of Individuals

331 Terminology and Definitions

(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875

119878119886isin 119875(119878 119886) 119875

119878119887isin

119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875

119879119889isin 119875(119879 119889) 119878 is a common ancestor

of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889

(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875

119860119886 119875119860119887) (or

119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when

119875119860119886

and 119875119860119887

(or 119875119860119888

and 119875119860119889) pass through the same parent of

119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call

119903 a heter-overlap individual when 119875119860119894

and 119875119860119895

pass throughthe same parent of 119903

(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap

individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path

Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875

119860119886

and 119875119860119887

119905 is a homo-overlap individual between 119875119860119888

and 119875119860119889 And

119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875

119860119886and 119875

119860119889 119910 is a heter-

overlap individual between 119875119860119887

and 119875119860119888 And 119860 rarr 119909 and

119860 rarr 119910 are root heter-overlap paths

332 Path-Counting Formula for Φ119886119887119888119889

Now we presenta path-pair level graphical representation for ⟨(119875

119860119886 119875119860119887)

(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can

be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875

119860119886 119875119860119887 119875119860119888 119875119860119889⟩

presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are

summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875

119860ℎ(ie the path 119875

119860ℎending at ℎ) and 119903

1

and 1199032are the last individuals of root heter-overlap paths 119875

1198601199031

and 1198751198601199032

respectivelyGiven a pedigree graph having one or multiple progeni-

tors 119901119894| 119894 gt 0 we define that the generation of a progenitor

Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩

Zero root 2-overlap andzero root 3-overlap

Zero root homo-overlap and zero rootheter-overlap

One root 2-overlap path

One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap

Two root 2-overlap paths

Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps

One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

One root 2-overlap andone root 3-overlap

One root homo-overlap and two rootheter-overlaps and 119903

1= 1199032= ℎ

One root homo-overlap and two rootheter-overlaps and ℎ = 119903

1= 1199032

119901119894is 0 denoted as gen(119901

119894) = 0 If an individual 119886 has only

one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1

The path-counting formula forΦ119886119887119888119889

is as follows

Φ119886119887119888119889

= sum

119860

( sum

Type 1(1

2)

1198712-pair

Φ119860119860119860

+ sum

Type 2(1

2)

1198712-pair+1

Φ119860119860119860

+ sum

Type 3(1

2)

1198712-pair+2

Φ119860119860

+ sum

Type 4(1

2)

1198712-pair+1

Φ119860119860)

+ sum

(119878119879)isinType 5(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

⟩+1

Φ119861119861

(16)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875

119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =

119860) there are four types (ieType 1 to Type 4)

12 Computational and Mathematical Methods in Medicine

S0S1 S2 S3 S4 S5 S6 S7

S8 S9 S10 S11 S12 S13 S14 S15 S16

PAa

PAdPAb

PAc

Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level

Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875

119860119903ending at 119903

Type 3

zero root homo-overlap and two rootheter-overlap 119875

1198601199031and1198751198601199032

ending at1199031and 1199032 respectively

one root homo-overlap 119875119860ℎ

ending at ℎand two root heter-overlap 119875

1198601199031and 119875

1198601199032

ending at 1199031and 1199032 and 119903

1= 1199032

(17)

Type 4 one root homo-overlap 119875119860ℎ

ending at ℎ andtwo root heter-overlap ending at 119903

1and 1199032 and ℎ =

1199031= 1199032 For ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is

one type (ie Type 5)Type 5 ⟨119875

119878119886 119875119878119887⟩ has zero overlap individual ⟨119875

119879119888

119875119879119889⟩ has zero overlap individual

At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875

119879119888

119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875

119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩

there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879

119861=

119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)

and 119879 has two parents119879 otherwise

1198712-pair =

119871119875119860119886+ 119871119875119860119887

+119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119903

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 1198711198751198601199031

minus 1198711198751198601199032

for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860ℎfor Type 4

119871⟨119875119878119886 119875119878119887⟩

= 119871119875119878119886+ 119871119875119878119887

for Type 5

119871⟨119875119879119888 119875119879119889⟩

= 119871119875119879119888+ 119871119875119879119889

for Type 5

(18)

Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ

119886119887119888119889

Φ119886119887119888119889

= sum

(119878119879)isinType 6(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

Φ119878119878lowast Φ119879119879 (19)

Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875

119879119888

119875119879119889⟩ is a nonoverlapping path-pair Between a path from

⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩ there are no overlap

individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩

and 119871⟨119875119879119888119875119879119889⟩

are defined as in Type 5The correctness of the path-counting formula forΦ

119886119887119888119889is

proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ

119886119886119887119888 Φ119886119887119886119888

Φ119886119887119886119887

andΦ119886119886119886119887

34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method

Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows

(1) If 119906 is 119903 then NC(119903) contains only one element theempty string

(2) Otherwise let 119906 be a node with NC(119906) and V0 V1

V119896be 119906rsquos children in sibling order then for each 119909

in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le

119894 le 119896 and lowast indicates the gender of the individualrepresented by node V

119894

Computational and Mathematical Methods in Medicine 13

Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33

In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group

We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement

In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients

In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree

Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130

4 Conclusion

We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients

0

50

100

150

200

77 181

383

769

1558

3105

6174

1235

1

2466

7

4976

1

9832

8

1951

97

250

300

Aver

age t

ime (

ms)

Individuals in pedigree

RecursiveNodecodes

Figure 14 The effect of pedigree size on computation efficiencyimprovement

0200400600800

10001200140016001800

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Aver

age t

ime (

ms)

Depth

RecursiveNodeCodes

Figure 15 The effect of depth on computation efficiency improve-ment

we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees

Appendices

A Path-Counting Formulas of Special Cases

A1 Path-Counting Formula for Φ119886119886119887

For ⟨1198751198601198861 1198751198601198862⟩ we

introduce a special case where 1198751198601198861

and 1198751198601198862

aremergeable

14 Computational and Mathematical Methods in Medicine

PAa1 PAa2 PAa1 PAa2

S0 S1

PAb PAb PAb

If is mergeable⟨P ⟩Aa1 PAa2

PAa

S2 S3

Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩

Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861

1198751198601198862⟩ is mergeable if and only if the two paths 119875

1198601198861and 119875

1198601198862

are completely identical

Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩ in Figure 16

Lemma A2 For 1198782and 119878

3in Figure 16 ⟨119875

1198601198861 1198751198601198862⟩ cannot

be a mergeable path-pair

Proof For 1198782and 119878

3 if ⟨119875

1198601198861 1198751198601198862⟩ is mergeable then

any common individual 119904 between 1198751198601198861

and 119875119860119887

is alsoa shared individual between 119875

1198601198862and 119875

119860119887 It means

119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that

119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0

Considering all three scenarios in Figure 16 only 1198781can

have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now

we present our path-counting formula forΦ119886119886119887

where 119886 is notan ancestor of 119887

Φ119886119886119887

= sum

119860

( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860

+ sum

Type 3(1

2)

119871⟨119875119860119886119875119860119887⟩+1

Φ119860119860)

(A1)

where 119860 a common ancestor of 119886 and 119887When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap

Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at the individual 119904

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair

119871 triple = 1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

for Type 3

(A2)

For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ

119886119886119887in [10] but we can use either

the recursive formula for Φ119886119887119888

or the path-counting formulaforΦ119886119887119888

to computeΦ11988611198862119887

A2 Path-Counting Formula for Φ119886119886119887119888

Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable then

we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888

119875119860119889⟩ If ⟨119875

1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875

1198601198861 1198751198601198862

119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875

119860119886 119875119860119887 119875119860119888⟩

Now we present a path-counting formula forΦ119886119886119887119888

where119886 is not an ancestor of 119887 and 119888 as follows

Φ119886119886119887119888

= sum

119860

( sum

Type 1(1

2)

119871quadminus1

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad

ΦAAA

+ sum

Type 3(1

2)

119871quad+1

Φ119860119860)

+sum

119860

( sum

Type 4(1

2)

119871 triple+1

Φ119860119860119860

+ sum

Type 5(1

2)

119871 triple+2

Φ119860119860)

(A3)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875

119860119904ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

and 1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path 119875119860119905

ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904

and 119875119860119905

ending at 119904 and 119905respectively

(A4)

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path

Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875

119860119904

ending at 119904

119871quad=

1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119904

for Type 21198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905

for Case 2isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905minus 119871119875119860119904

for Case 3isinType 3

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 5(A5)

Computational and Mathematical Methods in Medicine 15

Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ

119886119887119888119889is applicable

to computeΦ11988611198862119887119888

A3 Path-Counting Formula for Φ119886119886119886119887

A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced

when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of

a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-

densed to ⟨119875119860119886 119875119860119887⟩

Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861

1198751198601198862

and 1198751198601198863

they are mergeable if and only if theyare completely identical

Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there

must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩

⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩

Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one

mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and

⟨1198751198601198862 1198751198601198863⟩

For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable

path-pairNow we present the path-counting formula for Φ

119886119886119886119887

where 119886 is not an ancestor of 119887 as follows

Φ119886119886119886119887

= sum

119860

(3

2( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860)

+ sum

Type 3(1

2)

119871pair+2

Φ119860119860)

(A6)

where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-

sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)

Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path

Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at 119904

When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping

119871 triple = 1198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3

(A7)

Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887

=

Φ119886111988621198863119887

Then we apply the path-counting formula forΦ119886119887119888119889

to computeΦ119886111988621198863119887

Case21 Case31 ΦAAAΦabCase22 Case32

Case23 ΦAA

Figure 17 Dependency graph for different cases regardingΦ119886119887119888

andΦ119886119886119887

B Proof for Path-Counting Formulas ofThree Individuals

Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ

119886119887119888is equivalent to the

computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors

B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ

119886119887119888andΦ

119886119886119887119866 can

have 5 different cases

Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 23 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

having mergeablepath-pair⟨119875

1198601198861 1198751198601198862⟩

lArr997904 Φ119886119886119887

Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlapCase 32 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlap

lArr997904 Φ119886119887119888

(B1)

Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals

Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)

(i) forΦ119886119887 the correctness of the path-counting formula

(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32

(ii) for Case 23 it has no cycle but only depends on Φ119886119887

Thus we prove the correctness of Case 23 by trans-forming the case toΦ

119886119887

16 Computational and Mathematical Methods in Medicine

a b

c

(a)

A

a b c

(b)

Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another

Parent-child relationshipAncestor-descendant relationship

A

a

s v p

f b c

(a)

Parent-child relationshipAncestor-descendant relationship

c

a

s v

f b

(b)

Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887

(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866

B11 Correctness Proof for Case 31

Case 31 ForΦ119886119887119888

119866 does not have any path triples ⟨119875119860119886 119875119860119887

119875119860119888⟩ with root overlap

Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888

Using the recursive formula (3) to compute Φ119886119887119888

forFigure 18(a) Φ

119886119887119888= (12)Φ

119888119887119888= (12)

2

Φ119888119888119888 for Figure 18(b)

Φ119886119887119888= (12)Φ

119860119887119888= (12)

2

Φ119860119860119888

= (12)3

Φ119860119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the

contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

follows sumType 1(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

where 119871⟨119875119860119886119875119860119887 119875119860119888⟩

=

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

For Figure 18(a) 119888 is the only triple-common ancestor

and we obtain Φ119886119887119888

= (12)119871⟨119875119888119886119875119888119887

119875119888119888⟩Φ119888119888119888

= (12)2

Φ119888119888119888 for

Figure 18(b) we obtain Φ119886119887119888

= (12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

=

(12)3

Φ119860119860119860

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1

For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866

then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888

119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ

119891119887119888= sumType 1(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860

Based on the recursive formula (3)Φ

119886119887119888= (12)(Φ

119891119887119888+Φ119898119887119888)

where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ

119898119887119888= 0 Then we can plug-in the path-

counting formula forΦ119891119887119888

to obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2lowast sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩

Φ119860119860119860

= sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860119860

∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 1(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩

Φ119860119860119860

(B2)

Similarly for Figure 19(b) we obtain Φ119886119887119888

=

sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1

Φ119888119888119888= sumType 1(12)

119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888

Thus it is true for 119899 = 119896 + 1

B12 Correctness Proof for Case 32

Case 32 ForΦ119886119887119888

119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root

overlap

Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888

Computational and Mathematical Methods in Medicine 17

a

b

c

(a)

A

a

b c

(b)

A

a

s

b

c

(c)

Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another

Using the recursive formula (3) to compute Φ119886119887119888

inFigure 20 for Figure 20(a) Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)3

Φ119888119888 for Figure 20(b)Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)4

Φ119860119860

for Figure 20(c)Φ119886119887119888= (12)

2

Φ119904119904119888= (12)

3

Φ119904119888=

(12)5

Φ119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-

tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

followssumType 2(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩+1

Φ119860119860

where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

=

119871119875119860119886

+ 119871119875119860119887

+ 119871119875119860119888minus 119871119875119860119904

and 119904 is the last individual of theroot overlap path 119875

119860119904

For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ

119886119887119888= (12)

119871⟨119875119888119886119875119888119887119875119888119888⟩+1

Φ119888119888= (12)

2+1

Φ119888119888=

(12)3

Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain

Φ119886119887119888= (12)

4

Φ119860119860

and Φ119886119887119888= (12)

5

Φ119860119860

respectively

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1

For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ

119891119887119888in 119866lowast Φ

119891119887119888= sumType 2(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

In 119866 119891 is the only parent of 119886 according to the recursive

formula (3) we have Φ119886119887119888= (12)Φ

119891119887119888 Then we can plug-in

the Φ119891119887119888

and obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩+1

Φ119860119860

(B3)

For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ

119886119887119888for Figure 21(a)

In summary it is true for 119899 = 119896 + 1

A

a

s

t

f

b

c

(a)

a

t

b

A

s c

(b)

a

s

t

b

c

(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887

B13 Correctness Proof for Case 23

Case 23 For Φ119886119886119887

the path-triples in the pedigree graph 119866have mergeable path-pair

Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875

1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means

that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ

119886119886119887can be computed as follows

sumType 3(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

where 119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

Using the recursive formula (4) we obtain Φ

119886119886119887=

(12)(Φ119886119887+ Φ119891119898119887)

For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891

there4 Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

=1

2(Φ119886119887+ 0) =

1

2Φ119886119887

(as 119898 is missing) (B4)

For Φ119886119887 we use Wrightrsquos formula and obtain Φ

119886119887=

sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860

where 119875 denotes all nonoverlappingpath-pairs ⟨119875

119860119886 119875119860119887⟩

Then we have Φ119886119886119887

= (12)Φ119886119887

=

(12)sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

For Figure 22(b) we can also transform the computation

of Φ119886119886119887

to Φ119886119887

In summary it shows that the path-counting formula(A1) is true for Case 23

B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887

whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ

119886119886119887can be

transformed toΦ11988611198862119887

which is equivalent to the computationof Φ119886119887119888

for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ

119886119886119887when the path-triple belongs

to either Case 21 or Case 22

B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 12: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

12 Computational and Mathematical Methods in Medicine

S0S1 S2 S3 S4 S5 S6 S7

S8 S9 S10 S11 S12 S13 S14 S15 S16

PAa

PAdPAb

PAc

Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level

Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875

119860119903ending at 119903

Type 3

zero root homo-overlap and two rootheter-overlap 119875

1198601199031and1198751198601199032

ending at1199031and 1199032 respectively

one root homo-overlap 119875119860ℎ

ending at ℎand two root heter-overlap 119875

1198601199031and 119875

1198601199032

ending at 1199031and 1199032 and 119903

1= 1199032

(17)

Type 4 one root homo-overlap 119875119860ℎ

ending at ℎ andtwo root heter-overlap ending at 119903

1and 1199032 and ℎ =

1199031= 1199032 For ⟨(119875

119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is

one type (ie Type 5)Type 5 ⟨119875

119878119886 119875119878119887⟩ has zero overlap individual ⟨119875

119879119888

119875119879119889⟩ has zero overlap individual

At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875

119879119888

119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875

119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩

there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879

119861=

119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)

and 119879 has two parents119879 otherwise

1198712-pair =

119871119875119860119886+ 119871119875119860119887

+119871119875119860119888+ 119871119875119860119889

for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 119871119875119860119903

for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 1198711198751198601199031

minus 1198711198751198601199032

for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

+119871119875119860119889minus 2 lowast 119871

119875119860ℎfor Type 4

119871⟨119875119878119886 119875119878119887⟩

= 119871119875119878119886+ 119871119875119878119887

for Type 5

119871⟨119875119879119888 119875119879119889⟩

= 119871119875119879119888+ 119871119875119879119889

for Type 5

(18)

Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ

119886119887119888119889

Φ119886119887119888119889

= sum

(119878119879)isinType 6(1

2)

119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889

Φ119878119878lowast Φ119879119879 (19)

Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875

119879119888

119875119879119889⟩ is a nonoverlapping path-pair Between a path from

⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875

119879119888 119875119879119889⟩ there are no overlap

individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩

and 119871⟨119875119879119888119875119879119889⟩

are defined as in Type 5The correctness of the path-counting formula forΦ

119886119887119888119889is

proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ

119886119886119887119888 Φ119886119887119886119888

Φ119886119887119886119887

andΦ119886119886119886119887

34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method

Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows

(1) If 119906 is 119903 then NC(119903) contains only one element theempty string

(2) Otherwise let 119906 be a node with NC(119906) and V0 V1

V119896be 119906rsquos children in sibling order then for each 119909

in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le

119894 le 119896 and lowast indicates the gender of the individualrepresented by node V

119894

Computational and Mathematical Methods in Medicine 13

Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33

In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group

We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement

In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients

In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree

Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130

4 Conclusion

We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients

0

50

100

150

200

77 181

383

769

1558

3105

6174

1235

1

2466

7

4976

1

9832

8

1951

97

250

300

Aver

age t

ime (

ms)

Individuals in pedigree

RecursiveNodecodes

Figure 14 The effect of pedigree size on computation efficiencyimprovement

0200400600800

10001200140016001800

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Aver

age t

ime (

ms)

Depth

RecursiveNodeCodes

Figure 15 The effect of depth on computation efficiency improve-ment

we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees

Appendices

A Path-Counting Formulas of Special Cases

A1 Path-Counting Formula for Φ119886119886119887

For ⟨1198751198601198861 1198751198601198862⟩ we

introduce a special case where 1198751198601198861

and 1198751198601198862

aremergeable

14 Computational and Mathematical Methods in Medicine

PAa1 PAa2 PAa1 PAa2

S0 S1

PAb PAb PAb

If is mergeable⟨P ⟩Aa1 PAa2

PAa

S2 S3

Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩

Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861

1198751198601198862⟩ is mergeable if and only if the two paths 119875

1198601198861and 119875

1198601198862

are completely identical

Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩ in Figure 16

Lemma A2 For 1198782and 119878

3in Figure 16 ⟨119875

1198601198861 1198751198601198862⟩ cannot

be a mergeable path-pair

Proof For 1198782and 119878

3 if ⟨119875

1198601198861 1198751198601198862⟩ is mergeable then

any common individual 119904 between 1198751198601198861

and 119875119860119887

is alsoa shared individual between 119875

1198601198862and 119875

119860119887 It means

119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that

119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0

Considering all three scenarios in Figure 16 only 1198781can

have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now

we present our path-counting formula forΦ119886119886119887

where 119886 is notan ancestor of 119887

Φ119886119886119887

= sum

119860

( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860

+ sum

Type 3(1

2)

119871⟨119875119860119886119875119860119887⟩+1

Φ119860119860)

(A1)

where 119860 a common ancestor of 119886 and 119887When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap

Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at the individual 119904

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair

119871 triple = 1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

for Type 3

(A2)

For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ

119886119886119887in [10] but we can use either

the recursive formula for Φ119886119887119888

or the path-counting formulaforΦ119886119887119888

to computeΦ11988611198862119887

A2 Path-Counting Formula for Φ119886119886119887119888

Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable then

we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888

119875119860119889⟩ If ⟨119875

1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875

1198601198861 1198751198601198862

119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875

119860119886 119875119860119887 119875119860119888⟩

Now we present a path-counting formula forΦ119886119886119887119888

where119886 is not an ancestor of 119887 and 119888 as follows

Φ119886119886119887119888

= sum

119860

( sum

Type 1(1

2)

119871quadminus1

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad

ΦAAA

+ sum

Type 3(1

2)

119871quad+1

Φ119860119860)

+sum

119860

( sum

Type 4(1

2)

119871 triple+1

Φ119860119860119860

+ sum

Type 5(1

2)

119871 triple+2

Φ119860119860)

(A3)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875

119860119904ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

and 1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path 119875119860119905

ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904

and 119875119860119905

ending at 119904 and 119905respectively

(A4)

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path

Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875

119860119904

ending at 119904

119871quad=

1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119904

for Type 21198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905

for Case 2isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905minus 119871119875119860119904

for Case 3isinType 3

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 5(A5)

Computational and Mathematical Methods in Medicine 15

Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ

119886119887119888119889is applicable

to computeΦ11988611198862119887119888

A3 Path-Counting Formula for Φ119886119886119886119887

A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced

when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of

a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-

densed to ⟨119875119860119886 119875119860119887⟩

Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861

1198751198601198862

and 1198751198601198863

they are mergeable if and only if theyare completely identical

Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there

must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩

⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩

Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one

mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and

⟨1198751198601198862 1198751198601198863⟩

For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable

path-pairNow we present the path-counting formula for Φ

119886119886119886119887

where 119886 is not an ancestor of 119887 as follows

Φ119886119886119886119887

= sum

119860

(3

2( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860)

+ sum

Type 3(1

2)

119871pair+2

Φ119860119860)

(A6)

where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-

sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)

Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path

Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at 119904

When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping

119871 triple = 1198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3

(A7)

Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887

=

Φ119886111988621198863119887

Then we apply the path-counting formula forΦ119886119887119888119889

to computeΦ119886111988621198863119887

Case21 Case31 ΦAAAΦabCase22 Case32

Case23 ΦAA

Figure 17 Dependency graph for different cases regardingΦ119886119887119888

andΦ119886119886119887

B Proof for Path-Counting Formulas ofThree Individuals

Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ

119886119887119888is equivalent to the

computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors

B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ

119886119887119888andΦ

119886119886119887119866 can

have 5 different cases

Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 23 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

having mergeablepath-pair⟨119875

1198601198861 1198751198601198862⟩

lArr997904 Φ119886119886119887

Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlapCase 32 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlap

lArr997904 Φ119886119887119888

(B1)

Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals

Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)

(i) forΦ119886119887 the correctness of the path-counting formula

(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32

(ii) for Case 23 it has no cycle but only depends on Φ119886119887

Thus we prove the correctness of Case 23 by trans-forming the case toΦ

119886119887

16 Computational and Mathematical Methods in Medicine

a b

c

(a)

A

a b c

(b)

Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another

Parent-child relationshipAncestor-descendant relationship

A

a

s v p

f b c

(a)

Parent-child relationshipAncestor-descendant relationship

c

a

s v

f b

(b)

Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887

(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866

B11 Correctness Proof for Case 31

Case 31 ForΦ119886119887119888

119866 does not have any path triples ⟨119875119860119886 119875119860119887

119875119860119888⟩ with root overlap

Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888

Using the recursive formula (3) to compute Φ119886119887119888

forFigure 18(a) Φ

119886119887119888= (12)Φ

119888119887119888= (12)

2

Φ119888119888119888 for Figure 18(b)

Φ119886119887119888= (12)Φ

119860119887119888= (12)

2

Φ119860119860119888

= (12)3

Φ119860119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the

contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

follows sumType 1(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

where 119871⟨119875119860119886119875119860119887 119875119860119888⟩

=

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

For Figure 18(a) 119888 is the only triple-common ancestor

and we obtain Φ119886119887119888

= (12)119871⟨119875119888119886119875119888119887

119875119888119888⟩Φ119888119888119888

= (12)2

Φ119888119888119888 for

Figure 18(b) we obtain Φ119886119887119888

= (12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

=

(12)3

Φ119860119860119860

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1

For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866

then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888

119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ

119891119887119888= sumType 1(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860

Based on the recursive formula (3)Φ

119886119887119888= (12)(Φ

119891119887119888+Φ119898119887119888)

where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ

119898119887119888= 0 Then we can plug-in the path-

counting formula forΦ119891119887119888

to obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2lowast sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩

Φ119860119860119860

= sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860119860

∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 1(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩

Φ119860119860119860

(B2)

Similarly for Figure 19(b) we obtain Φ119886119887119888

=

sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1

Φ119888119888119888= sumType 1(12)

119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888

Thus it is true for 119899 = 119896 + 1

B12 Correctness Proof for Case 32

Case 32 ForΦ119886119887119888

119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root

overlap

Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888

Computational and Mathematical Methods in Medicine 17

a

b

c

(a)

A

a

b c

(b)

A

a

s

b

c

(c)

Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another

Using the recursive formula (3) to compute Φ119886119887119888

inFigure 20 for Figure 20(a) Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)3

Φ119888119888 for Figure 20(b)Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)4

Φ119860119860

for Figure 20(c)Φ119886119887119888= (12)

2

Φ119904119904119888= (12)

3

Φ119904119888=

(12)5

Φ119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-

tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

followssumType 2(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩+1

Φ119860119860

where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

=

119871119875119860119886

+ 119871119875119860119887

+ 119871119875119860119888minus 119871119875119860119904

and 119904 is the last individual of theroot overlap path 119875

119860119904

For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ

119886119887119888= (12)

119871⟨119875119888119886119875119888119887119875119888119888⟩+1

Φ119888119888= (12)

2+1

Φ119888119888=

(12)3

Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain

Φ119886119887119888= (12)

4

Φ119860119860

and Φ119886119887119888= (12)

5

Φ119860119860

respectively

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1

For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ

119891119887119888in 119866lowast Φ

119891119887119888= sumType 2(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

In 119866 119891 is the only parent of 119886 according to the recursive

formula (3) we have Φ119886119887119888= (12)Φ

119891119887119888 Then we can plug-in

the Φ119891119887119888

and obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩+1

Φ119860119860

(B3)

For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ

119886119887119888for Figure 21(a)

In summary it is true for 119899 = 119896 + 1

A

a

s

t

f

b

c

(a)

a

t

b

A

s c

(b)

a

s

t

b

c

(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887

B13 Correctness Proof for Case 23

Case 23 For Φ119886119886119887

the path-triples in the pedigree graph 119866have mergeable path-pair

Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875

1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means

that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ

119886119886119887can be computed as follows

sumType 3(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

where 119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

Using the recursive formula (4) we obtain Φ

119886119886119887=

(12)(Φ119886119887+ Φ119891119898119887)

For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891

there4 Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

=1

2(Φ119886119887+ 0) =

1

2Φ119886119887

(as 119898 is missing) (B4)

For Φ119886119887 we use Wrightrsquos formula and obtain Φ

119886119887=

sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860

where 119875 denotes all nonoverlappingpath-pairs ⟨119875

119860119886 119875119860119887⟩

Then we have Φ119886119886119887

= (12)Φ119886119887

=

(12)sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

For Figure 22(b) we can also transform the computation

of Φ119886119886119887

to Φ119886119887

In summary it shows that the path-counting formula(A1) is true for Case 23

B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887

whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ

119886119886119887can be

transformed toΦ11988611198862119887

which is equivalent to the computationof Φ119886119887119888

for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ

119886119886119887when the path-triple belongs

to either Case 21 or Case 22

B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 13: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

Computational and Mathematical Methods in Medicine 13

Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33

In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group

We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement

In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients

In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree

Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130

4 Conclusion

We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients

0

50

100

150

200

77 181

383

769

1558

3105

6174

1235

1

2466

7

4976

1

9832

8

1951

97

250

300

Aver

age t

ime (

ms)

Individuals in pedigree

RecursiveNodecodes

Figure 14 The effect of pedigree size on computation efficiencyimprovement

0200400600800

10001200140016001800

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Aver

age t

ime (

ms)

Depth

RecursiveNodeCodes

Figure 15 The effect of depth on computation efficiency improve-ment

we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees

Appendices

A Path-Counting Formulas of Special Cases

A1 Path-Counting Formula for Φ119886119886119887

For ⟨1198751198601198861 1198751198601198862⟩ we

introduce a special case where 1198751198601198861

and 1198751198601198862

aremergeable

14 Computational and Mathematical Methods in Medicine

PAa1 PAa2 PAa1 PAa2

S0 S1

PAb PAb PAb

If is mergeable⟨P ⟩Aa1 PAa2

PAa

S2 S3

Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩

Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861

1198751198601198862⟩ is mergeable if and only if the two paths 119875

1198601198861and 119875

1198601198862

are completely identical

Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩ in Figure 16

Lemma A2 For 1198782and 119878

3in Figure 16 ⟨119875

1198601198861 1198751198601198862⟩ cannot

be a mergeable path-pair

Proof For 1198782and 119878

3 if ⟨119875

1198601198861 1198751198601198862⟩ is mergeable then

any common individual 119904 between 1198751198601198861

and 119875119860119887

is alsoa shared individual between 119875

1198601198862and 119875

119860119887 It means

119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that

119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0

Considering all three scenarios in Figure 16 only 1198781can

have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now

we present our path-counting formula forΦ119886119886119887

where 119886 is notan ancestor of 119887

Φ119886119886119887

= sum

119860

( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860

+ sum

Type 3(1

2)

119871⟨119875119860119886119875119860119887⟩+1

Φ119860119860)

(A1)

where 119860 a common ancestor of 119886 and 119887When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap

Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at the individual 119904

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair

119871 triple = 1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

for Type 3

(A2)

For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ

119886119886119887in [10] but we can use either

the recursive formula for Φ119886119887119888

or the path-counting formulaforΦ119886119887119888

to computeΦ11988611198862119887

A2 Path-Counting Formula for Φ119886119886119887119888

Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable then

we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888

119875119860119889⟩ If ⟨119875

1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875

1198601198861 1198751198601198862

119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875

119860119886 119875119860119887 119875119860119888⟩

Now we present a path-counting formula forΦ119886119886119887119888

where119886 is not an ancestor of 119887 and 119888 as follows

Φ119886119886119887119888

= sum

119860

( sum

Type 1(1

2)

119871quadminus1

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad

ΦAAA

+ sum

Type 3(1

2)

119871quad+1

Φ119860119860)

+sum

119860

( sum

Type 4(1

2)

119871 triple+1

Φ119860119860119860

+ sum

Type 5(1

2)

119871 triple+2

Φ119860119860)

(A3)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875

119860119904ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

and 1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path 119875119860119905

ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904

and 119875119860119905

ending at 119904 and 119905respectively

(A4)

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path

Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875

119860119904

ending at 119904

119871quad=

1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119904

for Type 21198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905

for Case 2isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905minus 119871119875119860119904

for Case 3isinType 3

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 5(A5)

Computational and Mathematical Methods in Medicine 15

Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ

119886119887119888119889is applicable

to computeΦ11988611198862119887119888

A3 Path-Counting Formula for Φ119886119886119886119887

A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced

when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of

a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-

densed to ⟨119875119860119886 119875119860119887⟩

Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861

1198751198601198862

and 1198751198601198863

they are mergeable if and only if theyare completely identical

Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there

must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩

⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩

Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one

mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and

⟨1198751198601198862 1198751198601198863⟩

For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable

path-pairNow we present the path-counting formula for Φ

119886119886119886119887

where 119886 is not an ancestor of 119887 as follows

Φ119886119886119886119887

= sum

119860

(3

2( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860)

+ sum

Type 3(1

2)

119871pair+2

Φ119860119860)

(A6)

where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-

sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)

Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path

Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at 119904

When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping

119871 triple = 1198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3

(A7)

Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887

=

Φ119886111988621198863119887

Then we apply the path-counting formula forΦ119886119887119888119889

to computeΦ119886111988621198863119887

Case21 Case31 ΦAAAΦabCase22 Case32

Case23 ΦAA

Figure 17 Dependency graph for different cases regardingΦ119886119887119888

andΦ119886119886119887

B Proof for Path-Counting Formulas ofThree Individuals

Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ

119886119887119888is equivalent to the

computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors

B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ

119886119887119888andΦ

119886119886119887119866 can

have 5 different cases

Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 23 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

having mergeablepath-pair⟨119875

1198601198861 1198751198601198862⟩

lArr997904 Φ119886119886119887

Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlapCase 32 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlap

lArr997904 Φ119886119887119888

(B1)

Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals

Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)

(i) forΦ119886119887 the correctness of the path-counting formula

(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32

(ii) for Case 23 it has no cycle but only depends on Φ119886119887

Thus we prove the correctness of Case 23 by trans-forming the case toΦ

119886119887

16 Computational and Mathematical Methods in Medicine

a b

c

(a)

A

a b c

(b)

Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another

Parent-child relationshipAncestor-descendant relationship

A

a

s v p

f b c

(a)

Parent-child relationshipAncestor-descendant relationship

c

a

s v

f b

(b)

Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887

(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866

B11 Correctness Proof for Case 31

Case 31 ForΦ119886119887119888

119866 does not have any path triples ⟨119875119860119886 119875119860119887

119875119860119888⟩ with root overlap

Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888

Using the recursive formula (3) to compute Φ119886119887119888

forFigure 18(a) Φ

119886119887119888= (12)Φ

119888119887119888= (12)

2

Φ119888119888119888 for Figure 18(b)

Φ119886119887119888= (12)Φ

119860119887119888= (12)

2

Φ119860119860119888

= (12)3

Φ119860119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the

contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

follows sumType 1(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

where 119871⟨119875119860119886119875119860119887 119875119860119888⟩

=

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

For Figure 18(a) 119888 is the only triple-common ancestor

and we obtain Φ119886119887119888

= (12)119871⟨119875119888119886119875119888119887

119875119888119888⟩Φ119888119888119888

= (12)2

Φ119888119888119888 for

Figure 18(b) we obtain Φ119886119887119888

= (12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

=

(12)3

Φ119860119860119860

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1

For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866

then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888

119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ

119891119887119888= sumType 1(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860

Based on the recursive formula (3)Φ

119886119887119888= (12)(Φ

119891119887119888+Φ119898119887119888)

where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ

119898119887119888= 0 Then we can plug-in the path-

counting formula forΦ119891119887119888

to obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2lowast sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩

Φ119860119860119860

= sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860119860

∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 1(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩

Φ119860119860119860

(B2)

Similarly for Figure 19(b) we obtain Φ119886119887119888

=

sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1

Φ119888119888119888= sumType 1(12)

119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888

Thus it is true for 119899 = 119896 + 1

B12 Correctness Proof for Case 32

Case 32 ForΦ119886119887119888

119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root

overlap

Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888

Computational and Mathematical Methods in Medicine 17

a

b

c

(a)

A

a

b c

(b)

A

a

s

b

c

(c)

Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another

Using the recursive formula (3) to compute Φ119886119887119888

inFigure 20 for Figure 20(a) Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)3

Φ119888119888 for Figure 20(b)Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)4

Φ119860119860

for Figure 20(c)Φ119886119887119888= (12)

2

Φ119904119904119888= (12)

3

Φ119904119888=

(12)5

Φ119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-

tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

followssumType 2(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩+1

Φ119860119860

where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

=

119871119875119860119886

+ 119871119875119860119887

+ 119871119875119860119888minus 119871119875119860119904

and 119904 is the last individual of theroot overlap path 119875

119860119904

For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ

119886119887119888= (12)

119871⟨119875119888119886119875119888119887119875119888119888⟩+1

Φ119888119888= (12)

2+1

Φ119888119888=

(12)3

Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain

Φ119886119887119888= (12)

4

Φ119860119860

and Φ119886119887119888= (12)

5

Φ119860119860

respectively

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1

For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ

119891119887119888in 119866lowast Φ

119891119887119888= sumType 2(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

In 119866 119891 is the only parent of 119886 according to the recursive

formula (3) we have Φ119886119887119888= (12)Φ

119891119887119888 Then we can plug-in

the Φ119891119887119888

and obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩+1

Φ119860119860

(B3)

For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ

119886119887119888for Figure 21(a)

In summary it is true for 119899 = 119896 + 1

A

a

s

t

f

b

c

(a)

a

t

b

A

s c

(b)

a

s

t

b

c

(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887

B13 Correctness Proof for Case 23

Case 23 For Φ119886119886119887

the path-triples in the pedigree graph 119866have mergeable path-pair

Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875

1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means

that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ

119886119886119887can be computed as follows

sumType 3(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

where 119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

Using the recursive formula (4) we obtain Φ

119886119886119887=

(12)(Φ119886119887+ Φ119891119898119887)

For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891

there4 Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

=1

2(Φ119886119887+ 0) =

1

2Φ119886119887

(as 119898 is missing) (B4)

For Φ119886119887 we use Wrightrsquos formula and obtain Φ

119886119887=

sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860

where 119875 denotes all nonoverlappingpath-pairs ⟨119875

119860119886 119875119860119887⟩

Then we have Φ119886119886119887

= (12)Φ119886119887

=

(12)sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

For Figure 22(b) we can also transform the computation

of Φ119886119886119887

to Φ119886119887

In summary it shows that the path-counting formula(A1) is true for Case 23

B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887

whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ

119886119886119887can be

transformed toΦ11988611198862119887

which is equivalent to the computationof Φ119886119887119888

for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ

119886119886119887when the path-triple belongs

to either Case 21 or Case 22

B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 14: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

14 Computational and Mathematical Methods in Medicine

PAa1 PAa2 PAa1 PAa2

S0 S1

PAb PAb PAb

If is mergeable⟨P ⟩Aa1 PAa2

PAa

S2 S3

Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩

Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861

1198751198601198862⟩ is mergeable if and only if the two paths 119875

1198601198861and 119875

1198601198862

are completely identical

Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862

119875119860119887⟩ in Figure 16

Lemma A2 For 1198782and 119878

3in Figure 16 ⟨119875

1198601198861 1198751198601198862⟩ cannot

be a mergeable path-pair

Proof For 1198782and 119878

3 if ⟨119875

1198601198861 1198751198601198862⟩ is mergeable then

any common individual 119904 between 1198751198601198861

and 119875119860119887

is alsoa shared individual between 119875

1198601198862and 119875

119860119887 It means

119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that

119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0

Considering all three scenarios in Figure 16 only 1198781can

have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now

we present our path-counting formula forΦ119886119886119887

where 119886 is notan ancestor of 119887

Φ119886119886119887

= sum

119860

( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860

+ sum

Type 3(1

2)

119871⟨119875119860119886119875119860119887⟩+1

Φ119860119860)

(A1)

where 119860 a common ancestor of 119886 and 119887When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap

Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at the individual 119904

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair

119871 triple = 1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

for Type 3

(A2)

For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ

119886119886119887in [10] but we can use either

the recursive formula for Φ119886119887119888

or the path-counting formulaforΦ119886119887119888

to computeΦ11988611198862119887

A2 Path-Counting Formula for Φ119886119886119887119888

Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable then

we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888

119875119860119889⟩ If ⟨119875

1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875

1198601198861 1198751198601198862

119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875

119860119886 119875119860119887 119875119860119888⟩

Now we present a path-counting formula forΦ119886119886119887119888

where119886 is not an ancestor of 119887 and 119888 as follows

Φ119886119886119887119888

= sum

119860

( sum

Type 1(1

2)

119871quadminus1

Φ119860119860119860119860

+ sum

Type 2(1

2)

119871quad

ΦAAA

+ sum

Type 3(1

2)

119871quad+1

Φ119860119860)

+sum

119860

( sum

Type 4(1

2)

119871 triple+1

Φ119860119860119860

+ sum

Type 5(1

2)

119871 triple+2

Φ119860119860)

(A3)

where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875

1198601198861 1198751198601198862⟩ is not mergeable

Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875

119860119904ending at 119904

Type 3

Case 1 two root 2-overlap paths 1198751198601199041

and 1198751198601199042

ending at 1199041and 1199042 respectively

Case 2 one root 3-overlap path 119875119860119905

ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904

and 119875119860119905

ending at 119904 and 119905respectively

(A4)

When ⟨1198751198601198861 1198751198601198862⟩ is mergeable

Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path

Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875

119860119904

ending at 119904

119871quad=

1198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

for Type 11198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119904

for Type 21198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus1198711198751198601199041

minus 1198711198751198601199042

for Case 1isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905

for Case 2isinType 31198711198751198601198861

+ 1198711198751198601198862

+ 119871119875119860119887+ 119871119875119860119888

minus119871119875119860119905minus 119871119875119860119904

for Case 3isinType 3

119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904

for Type 5(A5)

Computational and Mathematical Methods in Medicine 15

Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ

119886119887119888119889is applicable

to computeΦ11988611198862119887119888

A3 Path-Counting Formula for Φ119886119886119886119887

A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced

when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of

a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-

densed to ⟨119875119860119886 119875119860119887⟩

Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861

1198751198601198862

and 1198751198601198863

they are mergeable if and only if theyare completely identical

Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there

must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩

⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩

Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one

mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and

⟨1198751198601198862 1198751198601198863⟩

For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable

path-pairNow we present the path-counting formula for Φ

119886119886119886119887

where 119886 is not an ancestor of 119887 as follows

Φ119886119886119886119887

= sum

119860

(3

2( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860)

+ sum

Type 3(1

2)

119871pair+2

Φ119860119860)

(A6)

where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-

sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)

Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path

Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at 119904

When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping

119871 triple = 1198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3

(A7)

Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887

=

Φ119886111988621198863119887

Then we apply the path-counting formula forΦ119886119887119888119889

to computeΦ119886111988621198863119887

Case21 Case31 ΦAAAΦabCase22 Case32

Case23 ΦAA

Figure 17 Dependency graph for different cases regardingΦ119886119887119888

andΦ119886119886119887

B Proof for Path-Counting Formulas ofThree Individuals

Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ

119886119887119888is equivalent to the

computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors

B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ

119886119887119888andΦ

119886119886119887119866 can

have 5 different cases

Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 23 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

having mergeablepath-pair⟨119875

1198601198861 1198751198601198862⟩

lArr997904 Φ119886119886119887

Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlapCase 32 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlap

lArr997904 Φ119886119887119888

(B1)

Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals

Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)

(i) forΦ119886119887 the correctness of the path-counting formula

(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32

(ii) for Case 23 it has no cycle but only depends on Φ119886119887

Thus we prove the correctness of Case 23 by trans-forming the case toΦ

119886119887

16 Computational and Mathematical Methods in Medicine

a b

c

(a)

A

a b c

(b)

Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another

Parent-child relationshipAncestor-descendant relationship

A

a

s v p

f b c

(a)

Parent-child relationshipAncestor-descendant relationship

c

a

s v

f b

(b)

Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887

(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866

B11 Correctness Proof for Case 31

Case 31 ForΦ119886119887119888

119866 does not have any path triples ⟨119875119860119886 119875119860119887

119875119860119888⟩ with root overlap

Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888

Using the recursive formula (3) to compute Φ119886119887119888

forFigure 18(a) Φ

119886119887119888= (12)Φ

119888119887119888= (12)

2

Φ119888119888119888 for Figure 18(b)

Φ119886119887119888= (12)Φ

119860119887119888= (12)

2

Φ119860119860119888

= (12)3

Φ119860119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the

contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

follows sumType 1(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

where 119871⟨119875119860119886119875119860119887 119875119860119888⟩

=

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

For Figure 18(a) 119888 is the only triple-common ancestor

and we obtain Φ119886119887119888

= (12)119871⟨119875119888119886119875119888119887

119875119888119888⟩Φ119888119888119888

= (12)2

Φ119888119888119888 for

Figure 18(b) we obtain Φ119886119887119888

= (12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

=

(12)3

Φ119860119860119860

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1

For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866

then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888

119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ

119891119887119888= sumType 1(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860

Based on the recursive formula (3)Φ

119886119887119888= (12)(Φ

119891119887119888+Φ119898119887119888)

where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ

119898119887119888= 0 Then we can plug-in the path-

counting formula forΦ119891119887119888

to obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2lowast sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩

Φ119860119860119860

= sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860119860

∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 1(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩

Φ119860119860119860

(B2)

Similarly for Figure 19(b) we obtain Φ119886119887119888

=

sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1

Φ119888119888119888= sumType 1(12)

119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888

Thus it is true for 119899 = 119896 + 1

B12 Correctness Proof for Case 32

Case 32 ForΦ119886119887119888

119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root

overlap

Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888

Computational and Mathematical Methods in Medicine 17

a

b

c

(a)

A

a

b c

(b)

A

a

s

b

c

(c)

Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another

Using the recursive formula (3) to compute Φ119886119887119888

inFigure 20 for Figure 20(a) Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)3

Φ119888119888 for Figure 20(b)Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)4

Φ119860119860

for Figure 20(c)Φ119886119887119888= (12)

2

Φ119904119904119888= (12)

3

Φ119904119888=

(12)5

Φ119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-

tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

followssumType 2(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩+1

Φ119860119860

where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

=

119871119875119860119886

+ 119871119875119860119887

+ 119871119875119860119888minus 119871119875119860119904

and 119904 is the last individual of theroot overlap path 119875

119860119904

For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ

119886119887119888= (12)

119871⟨119875119888119886119875119888119887119875119888119888⟩+1

Φ119888119888= (12)

2+1

Φ119888119888=

(12)3

Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain

Φ119886119887119888= (12)

4

Φ119860119860

and Φ119886119887119888= (12)

5

Φ119860119860

respectively

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1

For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ

119891119887119888in 119866lowast Φ

119891119887119888= sumType 2(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

In 119866 119891 is the only parent of 119886 according to the recursive

formula (3) we have Φ119886119887119888= (12)Φ

119891119887119888 Then we can plug-in

the Φ119891119887119888

and obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩+1

Φ119860119860

(B3)

For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ

119886119887119888for Figure 21(a)

In summary it is true for 119899 = 119896 + 1

A

a

s

t

f

b

c

(a)

a

t

b

A

s c

(b)

a

s

t

b

c

(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887

B13 Correctness Proof for Case 23

Case 23 For Φ119886119886119887

the path-triples in the pedigree graph 119866have mergeable path-pair

Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875

1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means

that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ

119886119886119887can be computed as follows

sumType 3(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

where 119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

Using the recursive formula (4) we obtain Φ

119886119886119887=

(12)(Φ119886119887+ Φ119891119898119887)

For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891

there4 Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

=1

2(Φ119886119887+ 0) =

1

2Φ119886119887

(as 119898 is missing) (B4)

For Φ119886119887 we use Wrightrsquos formula and obtain Φ

119886119887=

sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860

where 119875 denotes all nonoverlappingpath-pairs ⟨119875

119860119886 119875119860119887⟩

Then we have Φ119886119886119887

= (12)Φ119886119887

=

(12)sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

For Figure 22(b) we can also transform the computation

of Φ119886119886119887

to Φ119886119887

In summary it shows that the path-counting formula(A1) is true for Case 23

B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887

whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ

119886119886119887can be

transformed toΦ11988611198862119887

which is equivalent to the computationof Φ119886119887119888

for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ

119886119886119887when the path-triple belongs

to either Case 21 or Case 22

B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 15: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

Computational and Mathematical Methods in Medicine 15

Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ

119886119887119888119889is applicable

to computeΦ11988611198862119887119888

A3 Path-Counting Formula for Φ119886119886119886119887

A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced

when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of

a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-

densed to ⟨119875119860119886 119875119860119887⟩

Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861

1198751198601198862

and 1198751198601198863

they are mergeable if and only if theyare completely identical

Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there

must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩

⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩

Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875

1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one

mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and

⟨1198751198601198862 1198751198601198863⟩

For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable

path-pairNow we present the path-counting formula for Φ

119886119886119886119887

where 119886 is not an ancestor of 119887 as follows

Φ119886119886119886119887

= sum

119860

(3

2( sum

Type 1(1

2)

119871 tripleminus1

Φ119860119860119860

+ sum

Type 2(1

2)

119871 triple

Φ119860119860)

+ sum

Type 3(1

2)

119871pair+2

Φ119860119860)

(A6)

where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-

sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)

Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path

Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path

119875119860119904

ending at 119904

When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable

Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping

119871 triple = 1198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887

for Type 11198711198751198601198861

+ 1198711198751198601198863

+ 119871119875119860119887minus 119871119875119860119904

for Type 2

119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3

(A7)

Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887

=

Φ119886111988621198863119887

Then we apply the path-counting formula forΦ119886119887119888119889

to computeΦ119886111988621198863119887

Case21 Case31 ΦAAAΦabCase22 Case32

Case23 ΦAA

Figure 17 Dependency graph for different cases regardingΦ119886119887119888

andΦ119886119886119887

B Proof for Path-Counting Formulas ofThree Individuals

Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ

119886119887119888is equivalent to the

computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors

B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ

119886119887119888andΦ

119886119886119887119866 can

have 5 different cases

Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with root overlapCase 23 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

having mergeablepath-pair⟨119875

1198601198861 1198751198601198862⟩

lArr997904 Φ119886119886119887

Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlapCase 32 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with root overlap

lArr997904 Φ119886119887119888

(B1)

Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals

Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)

(i) forΦ119886119887 the correctness of the path-counting formula

(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32

(ii) for Case 23 it has no cycle but only depends on Φ119886119887

Thus we prove the correctness of Case 23 by trans-forming the case toΦ

119886119887

16 Computational and Mathematical Methods in Medicine

a b

c

(a)

A

a b c

(b)

Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another

Parent-child relationshipAncestor-descendant relationship

A

a

s v p

f b c

(a)

Parent-child relationshipAncestor-descendant relationship

c

a

s v

f b

(b)

Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887

(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866

B11 Correctness Proof for Case 31

Case 31 ForΦ119886119887119888

119866 does not have any path triples ⟨119875119860119886 119875119860119887

119875119860119888⟩ with root overlap

Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888

Using the recursive formula (3) to compute Φ119886119887119888

forFigure 18(a) Φ

119886119887119888= (12)Φ

119888119887119888= (12)

2

Φ119888119888119888 for Figure 18(b)

Φ119886119887119888= (12)Φ

119860119887119888= (12)

2

Φ119860119860119888

= (12)3

Φ119860119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the

contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

follows sumType 1(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

where 119871⟨119875119860119886119875119860119887 119875119860119888⟩

=

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

For Figure 18(a) 119888 is the only triple-common ancestor

and we obtain Φ119886119887119888

= (12)119871⟨119875119888119886119875119888119887

119875119888119888⟩Φ119888119888119888

= (12)2

Φ119888119888119888 for

Figure 18(b) we obtain Φ119886119887119888

= (12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

=

(12)3

Φ119860119860119860

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1

For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866

then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888

119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ

119891119887119888= sumType 1(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860

Based on the recursive formula (3)Φ

119886119887119888= (12)(Φ

119891119887119888+Φ119898119887119888)

where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ

119898119887119888= 0 Then we can plug-in the path-

counting formula forΦ119891119887119888

to obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2lowast sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩

Φ119860119860119860

= sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860119860

∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 1(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩

Φ119860119860119860

(B2)

Similarly for Figure 19(b) we obtain Φ119886119887119888

=

sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1

Φ119888119888119888= sumType 1(12)

119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888

Thus it is true for 119899 = 119896 + 1

B12 Correctness Proof for Case 32

Case 32 ForΦ119886119887119888

119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root

overlap

Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888

Computational and Mathematical Methods in Medicine 17

a

b

c

(a)

A

a

b c

(b)

A

a

s

b

c

(c)

Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another

Using the recursive formula (3) to compute Φ119886119887119888

inFigure 20 for Figure 20(a) Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)3

Φ119888119888 for Figure 20(b)Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)4

Φ119860119860

for Figure 20(c)Φ119886119887119888= (12)

2

Φ119904119904119888= (12)

3

Φ119904119888=

(12)5

Φ119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-

tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

followssumType 2(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩+1

Φ119860119860

where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

=

119871119875119860119886

+ 119871119875119860119887

+ 119871119875119860119888minus 119871119875119860119904

and 119904 is the last individual of theroot overlap path 119875

119860119904

For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ

119886119887119888= (12)

119871⟨119875119888119886119875119888119887119875119888119888⟩+1

Φ119888119888= (12)

2+1

Φ119888119888=

(12)3

Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain

Φ119886119887119888= (12)

4

Φ119860119860

and Φ119886119887119888= (12)

5

Φ119860119860

respectively

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1

For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ

119891119887119888in 119866lowast Φ

119891119887119888= sumType 2(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

In 119866 119891 is the only parent of 119886 according to the recursive

formula (3) we have Φ119886119887119888= (12)Φ

119891119887119888 Then we can plug-in

the Φ119891119887119888

and obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩+1

Φ119860119860

(B3)

For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ

119886119887119888for Figure 21(a)

In summary it is true for 119899 = 119896 + 1

A

a

s

t

f

b

c

(a)

a

t

b

A

s c

(b)

a

s

t

b

c

(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887

B13 Correctness Proof for Case 23

Case 23 For Φ119886119886119887

the path-triples in the pedigree graph 119866have mergeable path-pair

Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875

1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means

that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ

119886119886119887can be computed as follows

sumType 3(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

where 119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

Using the recursive formula (4) we obtain Φ

119886119886119887=

(12)(Φ119886119887+ Φ119891119898119887)

For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891

there4 Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

=1

2(Φ119886119887+ 0) =

1

2Φ119886119887

(as 119898 is missing) (B4)

For Φ119886119887 we use Wrightrsquos formula and obtain Φ

119886119887=

sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860

where 119875 denotes all nonoverlappingpath-pairs ⟨119875

119860119886 119875119860119887⟩

Then we have Φ119886119886119887

= (12)Φ119886119887

=

(12)sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

For Figure 22(b) we can also transform the computation

of Φ119886119886119887

to Φ119886119887

In summary it shows that the path-counting formula(A1) is true for Case 23

B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887

whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ

119886119886119887can be

transformed toΦ11988611198862119887

which is equivalent to the computationof Φ119886119887119888

for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ

119886119886119887when the path-triple belongs

to either Case 21 or Case 22

B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 16: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

16 Computational and Mathematical Methods in Medicine

a b

c

(a)

A

a b c

(b)

Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another

Parent-child relationshipAncestor-descendant relationship

A

a

s v p

f b c

(a)

Parent-child relationshipAncestor-descendant relationship

c

a

s v

f b

(b)

Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887

(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866

B11 Correctness Proof for Case 31

Case 31 ForΦ119886119887119888

119866 does not have any path triples ⟨119875119860119886 119875119860119887

119875119860119888⟩ with root overlap

Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888

Using the recursive formula (3) to compute Φ119886119887119888

forFigure 18(a) Φ

119886119887119888= (12)Φ

119888119887119888= (12)

2

Φ119888119888119888 for Figure 18(b)

Φ119886119887119888= (12)Φ

119860119887119888= (12)

2

Φ119860119860119888

= (12)3

Φ119860119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the

contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

follows sumType 1(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

where 119871⟨119875119860119886119875119860119887 119875119860119888⟩

=

119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888

For Figure 18(a) 119888 is the only triple-common ancestor

and we obtain Φ119886119887119888

= (12)119871⟨119875119888119886119875119888119887

119875119888119888⟩Φ119888119888119888

= (12)2

Φ119888119888119888 for

Figure 18(b) we obtain Φ119886119887119888

= (12)119871⟨119875119860119886119875119860119887

119875119860119888⟩Φ119860119860119860

=

(12)3

Φ119860119860119860

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1

For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866

then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888

119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ

119891119887119888= sumType 1(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860

Based on the recursive formula (3)Φ

119886119887119888= (12)(Φ

119891119887119888+Φ119898119887119888)

where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ

119898119887119888= 0 Then we can plug-in the path-

counting formula forΦ119891119887119888

to obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2lowast sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩

Φ119860119860119860

= sum

Type 1(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860119860

∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 1(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩

Φ119860119860119860

(B2)

Similarly for Figure 19(b) we obtain Φ119886119887119888

=

sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1

Φ119888119888119888= sumType 1(12)

119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888

Thus it is true for 119899 = 119896 + 1

B12 Correctness Proof for Case 32

Case 32 ForΦ119886119887119888

119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root

overlap

Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888

Computational and Mathematical Methods in Medicine 17

a

b

c

(a)

A

a

b c

(b)

A

a

s

b

c

(c)

Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another

Using the recursive formula (3) to compute Φ119886119887119888

inFigure 20 for Figure 20(a) Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)3

Φ119888119888 for Figure 20(b)Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)4

Φ119860119860

for Figure 20(c)Φ119886119887119888= (12)

2

Φ119904119904119888= (12)

3

Φ119904119888=

(12)5

Φ119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-

tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

followssumType 2(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩+1

Φ119860119860

where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

=

119871119875119860119886

+ 119871119875119860119887

+ 119871119875119860119888minus 119871119875119860119904

and 119904 is the last individual of theroot overlap path 119875

119860119904

For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ

119886119887119888= (12)

119871⟨119875119888119886119875119888119887119875119888119888⟩+1

Φ119888119888= (12)

2+1

Φ119888119888=

(12)3

Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain

Φ119886119887119888= (12)

4

Φ119860119860

and Φ119886119887119888= (12)

5

Φ119860119860

respectively

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1

For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ

119891119887119888in 119866lowast Φ

119891119887119888= sumType 2(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

In 119866 119891 is the only parent of 119886 according to the recursive

formula (3) we have Φ119886119887119888= (12)Φ

119891119887119888 Then we can plug-in

the Φ119891119887119888

and obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩+1

Φ119860119860

(B3)

For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ

119886119887119888for Figure 21(a)

In summary it is true for 119899 = 119896 + 1

A

a

s

t

f

b

c

(a)

a

t

b

A

s c

(b)

a

s

t

b

c

(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887

B13 Correctness Proof for Case 23

Case 23 For Φ119886119886119887

the path-triples in the pedigree graph 119866have mergeable path-pair

Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875

1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means

that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ

119886119886119887can be computed as follows

sumType 3(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

where 119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

Using the recursive formula (4) we obtain Φ

119886119886119887=

(12)(Φ119886119887+ Φ119891119898119887)

For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891

there4 Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

=1

2(Φ119886119887+ 0) =

1

2Φ119886119887

(as 119898 is missing) (B4)

For Φ119886119887 we use Wrightrsquos formula and obtain Φ

119886119887=

sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860

where 119875 denotes all nonoverlappingpath-pairs ⟨119875

119860119886 119875119860119887⟩

Then we have Φ119886119886119887

= (12)Φ119886119887

=

(12)sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

For Figure 22(b) we can also transform the computation

of Φ119886119886119887

to Φ119886119887

In summary it shows that the path-counting formula(A1) is true for Case 23

B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887

whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ

119886119886119887can be

transformed toΦ11988611198862119887

which is equivalent to the computationof Φ119886119887119888

for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ

119886119886119887when the path-triple belongs

to either Case 21 or Case 22

B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 17: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

Computational and Mathematical Methods in Medicine 17

a

b

c

(a)

A

a

b c

(b)

A

a

s

b

c

(c)

Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another

Using the recursive formula (3) to compute Φ119886119887119888

inFigure 20 for Figure 20(a) Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)3

Φ119888119888 for Figure 20(b)Φ

119886119887119888= (12)Φ

119887119887119888= (12)

2

Φ119887119888=

(12)4

Φ119860119860

for Figure 20(c)Φ119886119887119888= (12)

2

Φ119904119904119888= (12)

3

Φ119904119888=

(12)5

Φ119860119860

Using the path-counting formula (12) if a path-triple

⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-

tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ

119886119887119888can be computed as

followssumType 2(12)119871⟨119875119860119886119875119860119887

119875119860119888⟩+1

Φ119860119860

where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

=

119871119875119860119886

+ 119871119875119860119887

+ 119871119875119860119888minus 119871119875119860119904

and 119904 is the last individual of theroot overlap path 119875

119860119904

For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ

119886119887119888= (12)

119871⟨119875119888119886119875119888119887119875119888119888⟩+1

Φ119888119888= (12)

2+1

Φ119888119888=

(12)3

Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain

Φ119886119887119888= (12)

4

Φ119860119860

and Φ119886119887119888= (12)

5

Φ119860119860

respectively

Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1

For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ

119891119887119888in 119866lowast Φ

119891119887119888= sumType 2(12)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

In 119866 119891 is the only parent of 119886 according to the recursive

formula (3) we have Φ119886119887119888= (12)Φ

119891119887119888 Then we can plug-in

the Φ119891119887119888

and obtain

Φ119886119887119888=1

2Φ119891119887119888

=1

2sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩

= 119871⟨119875119860119891119875119860119887 119875119860119888⟩

+ 1

there4 Φ119886119887119888= sum

Type 2(1

2)

119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1

Φ119860119860

= sum

Type 2(1

2)

119871⟨119875119860119886119875119860119887119875119860119888⟩+1

Φ119860119860

(B3)

For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ

119886119887119888for Figure 21(a)

In summary it is true for 119899 = 119896 + 1

A

a

s

t

f

b

c

(a)

a

t

b

A

s c

(b)

a

s

t

b

c

(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887

B13 Correctness Proof for Case 23

Case 23 For Φ119886119886119887

the path-triples in the pedigree graph 119866have mergeable path-pair

Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875

1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means

that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ

119886119886119887can be computed as follows

sumType 3(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

where 119871⟨119875119860119886 119875119860119887⟩

= 119871119875119860119886+ 119871119875119860119887

Using the recursive formula (4) we obtain Φ

119886119886119887=

(12)(Φ119886119887+ Φ119891119898119887)

For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891

there4 Φ119886119886119887

=1

2(Φ119886119887+ Φ119891119898119887)

=1

2(Φ119886119887+ 0) =

1

2Φ119886119887

(as 119898 is missing) (B4)

For Φ119886119887 we use Wrightrsquos formula and obtain Φ

119886119887=

sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860

where 119875 denotes all nonoverlappingpath-pairs ⟨119875

119860119886 119875119860119887⟩

Then we have Φ119886119886119887

= (12)Φ119886119887

=

(12)sum119875(12)119871⟨119875119860119886119875119860119887

⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887

⟩+1Φ119860119860

For Figure 22(b) we can also transform the computation

of Φ119886119886119887

to Φ119886119887

In summary it shows that the path-counting formula(A1) is true for Case 23

B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887

whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ

119886119886119887can be

transformed toΦ11988611198862119887

which is equivalent to the computationof Φ119886119887119888

for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ

119886119886119887when the path-triple belongs

to either Case 21 or Case 22

B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 18: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

18 Computational and Mathematical Methods in Medicine

A

a

s

w

t

f

b

Parent-child relationshipAncestor-descendant relationship

(a)

a

s

f

b

Parent-child relationshipAncestor-descendant relationship

(b)

Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886

Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ

119886119887119888is correctly

computed using the path counting formulas (12) and (A1)

Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888

The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection

Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866

Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888

Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860

119894| 1 le 119894 le 119896 +

1 Let 1198601be the most top triple-common ancestor such that

there is no individual among the remaining ancestors 119860119894|

2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860

1) denote the

contribution from 1198601to Φ119886119887119888

Because119860

1is themost top triple-common ancestor there

is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and

119888 which passes through 1198601 Then we can remove 119860

1from

119866 and delete all out-going edges from 1198601and obtain a new

graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860

119894| 2 le 119894 le 119896 + 1

For the new graph 1198661015840 we can apply our induction

hypothesis and obtainΦ119886119887119888(1198661015840

)For the most top triple-common ancestor 119860

1 there are

two different cases considering its relationship with the othertriple-common ancestors

(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who

is a descendant of 1198601

(2) there is at least one individual among 119860119894| 2 le 119894 le

119896 + 1 who is a descendant of 1198601

For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a

descendant of 1198601 the set of path-triples from 119860

1to 119886 119887 and

119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le

119896 + 1 to 119886 119887 and 119888 It also means that the contribution from

1198601toΦ119886119887119888

is independent of the contribution from the othertriple-common ancestors

Summing up all contributions we can obtainΦ119886119887119888(119866) =

Φ119886119887119888(1198661015840

) + 119878(1198601)

For (2) let119860119895be one descendant of119860

1 Now both119860

1and

119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905

119887 1198601rarr sdot sdot sdot rarr 119887 119905

119888 1198601rarr

sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888

If 119905119886 119905119887 and 119905

119888all pass through119860

119895 then the path-triple119901119905

119894

is not an eligible path-triple for Φ119886119887119888

When we compute thecontribution from119860

1toΦ119886119887119888

we exclude all such path-tripleswhere 119905

119886 119905119887 and 119905

119888all pass through a lower triple-common

ancestor In other words an eligible path-triple from 1198601

regarding Φ119886119887119888

cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860

1toΦ119886119887119888

is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ

119886119887119888(119866) = Φ

119886119887119888(1198661015840

) + 119878(1198601)

C Proof for Four Individuals and TwoPairs of Individuals

Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly

C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ

119886119887119888119889 Φ119886119886119887119888

andΦ119886119886119886119887

there are 15 cases for a pedigree graph 119866

Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with zero root overlapCase 22 119866 has path-triples

⟨1198751198601198861 1198751198601198862 119875119860119887⟩

with one root overlapCase 23 119866 has path-pairs

⟨119875119860119886 119875119860119887⟩

with zero root overlap

lArr997904 Φ119886119886119886119887

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 19: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

Computational and Mathematical Methods in Medicine 19

Case21

Case31 ΦAAA

ΦAAA

Case41

Case42

Case34ΦAA

Case32

Case331

Case22

Case23

Case431

Case35

Case432

Case4 33

Case332

Case333

Figure 23 Dependency graph for different cases for four individuals

Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with zero root overlapCase 32 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapCase 331 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with two root 2-overlapCase 332 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 3-overlapCase 333 119866 has path-quads

⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩

with one root 2-overlapand one root 3-overlap

Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩

with zero root overlapCase 35 119866 has path-triples

⟨119875119860119886 119875119860119887 119875119860119888⟩

with one root overlap

lArr997904 Φ119886119886119887119888

Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with zero root overlapCase 42 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapCase 431 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with two root 2-overlapCase 432 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 3-overlapCase 433 119866 has path-quads

⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩

with one root 2-overlapand one root 3-overlap

lArr997904 Φ119886119887119888119889

(C1)Then we construct a dependency graph shown in

Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the

intermediate steps including Cases 34 and 35 are already

proved for the computation of Φ119886119887119888

The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ

119886119887119888119889andΦ

119886119886119887119888 Similarly we can

obtain the transformation from Case 431 to Case 35

C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ

119886119887119888119889 there

are 9 cases which are listed as follows

Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and zero root heter-overlap

Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-

overlap and one root heter-overlap

Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root

homo-overlap and two root heter-overlap

Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root

homo-overlap and two root heter-overlap

Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-

overlap and zero root heter-overlap

Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-

overlap and zero root heter-overlap

Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root

overlapCase 47 119866 has path-triples ⟨119875

119860119886 119875119860119887 119875119860119888⟩ with one root

overlap

Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap

Then we construct a dependency graph for the casesrelating to Φ

119886119887119888119889in Figure 24

According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ

119886119887119888 The

correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ

119886119887119888119889and

Φ119886119887119886119888

Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 20: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

20 Computational and Mathematical Methods in Medicine

Case41

Case44

ΦAAA

Case42 Case46

Case48

ΦAA

ΦTT

Case431 Case47

Case432

ΦAAAA

Figure 24 Dependency graph for different cases for two pairs of individuals

Acknowledgments

The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823

References

[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7

[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004

[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006

[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148

[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974

[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969

[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964

[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964

[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966

[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981

[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007

[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009

[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011

[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008

[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009

[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922

[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973

[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012

[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002

[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000

[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 21: Research Article Path-Counting Formulas for Generalized ...downloads.hindawi.com/journals/cmmm/2014/898424.pdf · includes the calculation of risk ratios for qualitative disease,

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom