imputing supertrees and supernetworks from quartets

15
Imputing Supertrees and Supernetworks from Quartets By B. Holland, G. Conner, K. Huber, and V. Moulton Presented by Razieh Nokhbeh Zaeem

Upload: wallace-sloan

Post on 04-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Imputing Supertrees and Supernetworks from Quartets. By B. Holland, G. Conner, K. Huber, and V. Moulton Presented by Razieh Nokhbeh Zaeem. This talk. Basic problem: constructing an estimate of a species phylogeny (in this case, network) from a given set of gene trees - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Imputing Supertrees and Supernetworks from Quartets

Imputing Supertrees and Supernetworks from Quartets

ByB. Holland, G. Conner, K. Huber, and V. Moulton

Presented byRazieh Nokhbeh Zaeem

Page 2: Imputing Supertrees and Supernetworks from Quartets

This talk• Basic problem: constructing an estimate of a species

phylogeny (in this case, network) from a given set of gene trees

• Input: a set of partial gene trees (not all taxa)• Output: a supernetwork, allowing the conflicting signals• Algorithm by Holland et al.

– combines quartet-imputation with consensus network construction

• Experiments comparing the new method to previous method Z-closure and to MRP with respect to “False Positives”, “False Negatives”. – Q-imputation provides a useful complementary tool

Page 3: Imputing Supertrees and Supernetworks from Quartets

Q-imputation

• Some definitions: L(T), T|Z, Q(T) and • Let … : collection of input trees

corresponding to a collection of gene trees.• Put • For each tree , we sequentially insert all of

the taxa in into to get • Once we get all s, we apply consensus

network method to obtain a network

Page 4: Imputing Supertrees and Supernetworks from Quartets

For each

For each new taxon y:Find a place to add a pendant edge labeledby y

We are trying tochoose place p s.t.it maximizes the #of agreed quartets betweenand all other s

Choose randomly ifthere is more thanone place to add yto get the best score

If the max score is 0we don’t have enoughinformation

Polynomial time alg:

Page 5: Imputing Supertrees and Supernetworks from Quartets

An example – insert F into

FB|ADFB|AEFB|DEFA|DE

FA|CEFB|ACFB|AEFB|CE

FD|BC

Page 6: Imputing Supertrees and Supernetworks from Quartets

The consensus network• The consensus network (the split network):

Those splits of X that are displayed by more than a certain proportion, t, of the trees computed by Q-imputation

• In case t = 0 we drop the subscript t: splits which appear at least once

• For example:– If t = 100, then the consensus network is a strict-consensus tree

– If t = 50, then the consensus network is the majority-rule consensus tree

– If t < 50, then the consensus network may display conflicting splits

Page 7: Imputing Supertrees and Supernetworks from Quartets

Simulation• Three different types of input: (3 types of simulations)

1. Evolution is tree like. Gene trees are correct, but miss taxa2. Evolution is tree like. Gene trees have errors and miss taxa3. Evolution is not tree like. Random input trees.

• In each simulation, three parameters were varied:A. The species tree, either

• The completely balanced tree on 16 taxa or• The completely unbalanced tree on 16 taxa

B. g taking values 2, 4, 8, 16, and 32C. m (The number of taxa missing) taking values 1, 2, 3, 4, 5, and 6,

deleted randomly• One hundred repetitions were carried out for each parameter

combination.

Page 8: Imputing Supertrees and Supernetworks from Quartets

Simulation

• The split systems generated were:A. MRP: and , the splits in the majority-rule consensus and

strict consensus from MRP.B. Q-imputation: , and C. Z-closure: the splits generated using Z-closure

• Measuring FP and FN– FP: splits contained in the output split system that are

not in the input– FN: splits in input that are not in the output split system

Page 9: Imputing Supertrees and Supernetworks from Quartets

WIP• Definition: weak induction property (WIP):

– For input trees … any split S in should restrict to a split in for some

– The WIP holds for all splits in in case input trees are all subtrees of a phylogenetic tree.

– There are examples where WIP does not hold, although very few generated by Q-imputation.

• Z-closure satisfies WIP• Any method with WIP property cannot generate FP:

Every split in output has come from some tree in the input set, so there is not split which appears in output but not input.

• Q-imputation with t=0 cannot produce FN

Page 10: Imputing Supertrees and Supernetworks from Quartets

Simulation results: FP

• Z-closure cannot generate FP, so we just look at splits in Q-imputation and MRP.– 6000 different settings for each type of simulation.– Normalized numbers in parenthesis.

– Each tree on 16 taxa, 13 internal edges.

Type Method

Simulation 1 0 0 0 0 0

Simulation 2 36 (0.006) 35 (0.006) 0 87 (0.015) 46 (0.008)

Simulation 3 56 (0.009) 52 (0.009) 0 5252(0.875) 4368(0.728)

Page 11: Imputing Supertrees and Supernetworks from Quartets

Simulation 1 results: FN, normalized, %

g m 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

(1b)

2 0.01 0.17 0.30 0.51 0.63 0.92 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.06

4 0.01 0.01 0.05 0.13 0.32 0.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

8 0.00 0.00 0.00 0.00 0.02 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

32 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

(1u)

2 0.01 0.06 0.13 0.23 0.44 0.78 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.07

4 0.00 0.00 0.06 0.07 0.15 0.26 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01

8 0.00 0.00 0.00 0.02 0.03 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

16 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

32 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Z-closure Q-imoutaion20 MRP50

Page 12: Imputing Supertrees and Supernetworks from Quartets

Simulation 2 results: FN, normalized, %

g m 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

(2b)

2 0.04 0.16 0.27 0.48 0.65 0.77 0.00 0.00 0.00 0.00 0.00 0.00 0.61 0.54 0.34 0.30 0.17 0.07

4 0.00 0.05 0.07 0.14 0.10 0.30 0.00 0.00 0.00 0.00 0.00 0.00 1.67 1.45 1.40 1.16 1.04 0.81

8 0.00 0.00 0.00 0.00 0.00 0.01 2.89 2.59 2.49 2.06 1.81 1.42 3.30 3.01 2.81 2.47 2.23 1.91

16 0.00 0.00 0.00 0.00 0.00 0.00 6.49 6.00 5.32 4.96 4.42 3.62 6.56 6.02 5.38 5.03 4.45 3.77

32 0.00 0.00 0.00 0.00 0.00 0.00 13.13 12.16 11.15 9.83 8.66 7.61 13.13 12.19 11.15 9.84 8.67 7.62

(2u)

2 0.37 0.59 0.58 0.54 0.54 0.70 0.00 0.00 0.00 0.00 0.00 0.00 0.92 0.84 0.58 0.41 0.22 0.08

4 0.28 0.40 0.44 0.46 0.37 0.48 0.00 0.00 0.00 0.00 0.00 0.00 2.37 2.09 1.81 1.38 1.11 0.89

8 0.23 0.23 0.18 0.13 0.09 0.15 3.78 3.33 2.86 2.33 1.90 1.52 4.46 3.98 3.29 2.81 2.42 1.97

16 0.08 0.08 0.07 0.04 0.04 0.01 8.97 7.53 6.69 5.62 4.71 3.86 9.04 7.64 6.74 5.69 4.82 3.97

32 0.01 0.01 0.00 0.01 0.00 0.00 18.09 15.50 13.94 11.56 9.59 7.98 18.10 15.52 13.95 11.56 9.62 8.05

Z-closure Q-imoutaion20 MRP50

Page 13: Imputing Supertrees and Supernetworks from Quartets

Simulation 3 results: FN, normalized, %

gm 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

(3)

2 0.48 0.88 0.80 0.96 0.82 0.67 0.00 0.00 0.00 0.00 0.00 0.00 2.15 1.87 1.34 0.99 0.56 0.18

4 0.07 0.23 0.31 0.41 0.56 0.66 0.00 0.00 0.00 0.00 0.00 0.00 5.57 4.92 4.27 3.61 2.96 2.30

8 0.00 0.00 0.00 0.01 0.07 0.08 11.38 10.09 9.02 7.64 6.53 5.21 11.95 10.76 9.72 8.34 7.26 5.95

16 0.00 0.00 0.00 0.00 0.00 0.00 25.36 22.89 20.42 18.36 16.09 13.90 24.61 22.45 20.22 17.98 15.74 13.41

32 0.00 0.00 0.00 0.00 0.00 0.00 51.85 46.86 42.29 37.80 33.50 29.21 50.05 45.77 41.52 37.16 32.74 28.31

Z-closure Q-imoutaion20 MRP50

Page 14: Imputing Supertrees and Supernetworks from Quartets

Discussion on simulation results• By increasing the # of gene trees:

– FN produced by Z-closure reduces (good)– FN produced by Q-imputation increases (bad)

• As a supertree method (simulation 1 & 2), Q-imputation tended to return fewer FP (unsupported) splits, but also fewer supported splits (more FN (?)) than MRP

• As a supernetwork method, Q-imputation tended to give rise to FP but not FN(?), whereas Z-closure gave rise to FN but no FP

• Also, in simulations where there was an underlying species tree, while increasing number of gene trees:– For Z-closure the number of FN increased (?)– For the split system derived from applying a threshold to the trees completed by Q ‑

imputation, the number of FN had the desirable property of decreasing (?)• For the output to be visually palatable, we need to have some FN to restrict the

number of splits that are being displayed.– Q-imputation: a natural means to filter out splits.– Look at case study.

Page 15: Imputing Supertrees and Supernetworks from Quartets

Case study

7 genes, 45 taxa

Q-imputation Z-closure