university of missouridslsrv1.rnet.missouri.edu/~shangy/thesis/yanchen2013.docx · web viewdr. yi...

117
MULTI-DIMENSIONAL SCALING AND MODELLER BASED EVOLUATIONARY ALGORITHMS FOR PROTEIN MODEL REFINEMENT A Thesis Presented to the Faculty of the Graduate School at the University of Missouri-Columbia In Partial Fulfillment of the Requirements for the Degree Master of Science by Yan Chen

Upload: others

Post on 05-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

MULTI-DIMENSIONAL SCALING AND MODELLER BASED EVOLUATIONARY ALGORITHMS FOR PROTEIN MODEL REFINEMENT

A Thesis

Presented to

the Faculty of the Graduate School

at the University of Missouri-Columbia

In Partial Fulfillment

of the Requirements for the Degree

Master of Science

by

Yan Chen

Dr. Yi Shang, Thesis Supervisor

December 2013

The undersigned, appointed by the dean of the Graduate School, have examined the thesis entitled

MULTI-DIMENSIONAL SCALING AND MODELLER BASED EVOLUATIONARY ALGORITHMS FOR PROTEIN MODEL REFINEMENT

Presented by Yan Chen,

A candidate for the degree of Master of Computer Science,

And hereby certify that, in their opinion, it is worthy of acceptance.

Professor Yi Shang

Professor Dong Xu

Professor Ioan Kosztin

Acknowledgement

Firstly, I would like to thank my Holy Father who loves me and gives me strength to overcome every difficulty.

I am sincerely presenting deep appreciation to my advisor, Dr. Yi Shang, who continuously guides me in every stage of my M.S. study with great patience. His spirit of hard working and strictness in academy impresses me much and this will be a great treasure in all my life.

I would also like to thank the committee members for their guidance and participation during my M.S. study. Great gratitude is also given to my parents and my husband who love and encourage me to overcome difficulties in study and life. Many thanks are also given to my officemates, Son P. Nguyen, Jingfen Zhang, Zhiquan He, Hongbo Li, Jiong Zhang, Junlin Wang and Wei Xiong for their help and support in the completion of this thesis.

Table of contents

AcknowledgementII

List of figuresV

List of tablesVII

AbstractVIII

Chapter 1: INTRODUCTION1

Chapter 2: RELATED WORK5

2.1 Evolutionary Algorithms5

2.2 Protein Quality Evaluation6

2.2.1 Similarity Consensus7

2.2.2 Energy or scoring functions9

2.3 MDS11

2.4 MODELLER12

Chapter 3: THREE NEW EVOLUTIONARY ALGORITHMS FOR PROTEIN MODEL REFINEMNT13

3.1 Initial Population15

3.2 Evaluation16

3.3 Selection18

3.4 Crossover20

3.4.1 MDS-based method20

3.4.2 MODELLER-based method23

3.4.3 Hybrid method23

Chapter 4: EXPERIMENTAL Results25

4.1 Data Set25

4.2 Computation Time27

4.3 Results30

4.3.1 Refinement results using ProQ2 as quality assessment method30

4.3.2 Refinement results using CGDT-TS as quality assessment method35

Chapter 5: CONCLUSIONS AND FUTURE WORK41

REFERENCES43

List of figures

Figure 2. 1 Implementation of evolutionary algorithm6

Figure 3. 1 The evolutionary algorithm of the whole process14

Figure 3. 2 ProQ2 and GDT-TS of initial population for T065417

Figure 3. 3 CGDT-TS and GDT-TS of initial population for T065417

Figure 3. 4 Selection probability of ProQ2 for T065419

Figure 3. 5 Selection probability of CGDT-TS for T065419

Figure 3. 6 The WMDS-based crossover (WMDS-C) algorithm20

Figure 3. 7 The MODELLER-based crossover (MODELLER-C) algorithm23

Figure 3. 8 The Hybrid crossover (Hybrid-C) algorithm24

Figure 4. 1 The all computation time for different algorithms28

Figure 4. 2 The average iteration time in each step for different algorithms29

Figure 4. 3 The computation time for steps in WMDS-based EA of each iteration30

Figure 4. 4 The GDT-TS of best model using ProQ2 evaluation33

Figure 4. 5 The RMSD of best model using ProQ2 evaluation33

Figure 4. 6 The average GDT-TS of best 10 models using ProQ2 evaluation34

Figure 4. 7 The average GDT-TS of all models using ProQ2 evaluation35

Figure 4. 8 The GDT-TS of best model using CGDT-TS evaluation37

Figure 4. 9 The RMSD of best model using CGDT-TS evaluation38

Figure 4. 10 The average GDT-TS of best 10 models using CGDT-TS evaluation38

Figure 4. 11 The average GDT-TS of all models using CGDT-TS evaluation39

Figure 4. 12 The average improved GDT-TS for different algorithms40

V

List of tables

Table 4. 1 Summary of CASP10 refinement targets26

Table 4. 2 Summary of refinement results on CASP10 targets using ProQ232

Table 4. 3 Summary of refinement results on CASP10 targets using CGDT-TS36

Abstract

To computationally obtain an accurate prediction of the three-dimensional structure of a protein from its primary sequence is one of the most important problems in bioinformatics and has been actively researched for many years. Although a number of software packages have been developed and they sometimes perform well on template-based modeling, further improvement is needed for practical use. Model refinement is a step in the prediction process, in which improved structures are constructed based on a pool of initially generated models. Since the refinement category being added to the Critical Assessment of Structure Prediction (CASP) competition in 2008, CASP results show that it is a challenge for existing model refinement methods to improve model quality consistently.

This project focuses on evolutionary algorithms for protein model refinement. Three new algorithms have been developed, in which multidimensional scaling (MDS), MODELLER, and a hybrid of both are used as crossover operators, respectively. The MDS-based method takes a purely geometrical approach and generates a child model by combining the contact maps of multiple parents. The MODELLER-based method takes a statistical and energy minimization approach and uses the remodeling module in MODELLER program to generate new models from multiple parents. The hybrid method first generates models using the MDS-based method and then run them through the MODELLER-based method, aiming at combining the strength of both. Promising results have been obtained in experiments using CASP datasets. The MDS-based method improved the best of a pool of predicted models in terms of the global distance test score (GDT-TS) in 9 out of 16 test targets. For instance, for target T0680, the GDT-TS of a refined model is 0.833, much better than 0.763, the value of the best model in the initial pool.

Keywords: MDS, MODELLER, Evolutionary Algorithm

VII

Chapter 1: INTRODUCTION

Proteins are essential biochemical compounds that contribute to many processes in life. Functional properties of cells are dependent on the correct protein structure [1]. Misfolded proteins may lead to some diseases, such as Alzheimer’s, Parkinson’s, Type II Diabetes, even certain cancers [2] [3]. The knowledge of the protein tertiary structure can help in basic research on protein functions as well as in drug development. Both experimental methods and computational methods can be used in protein structure acquirement. X-ray crystallography is the most wildly used method to finding the protein structures in experimental methods. Quaternary structure of large proteins can be determined by electron microscopes (cryoEM) or nuclear magnetic resonance (NMR). However, it is costly, slow and difficult to find the protein tertiary structures through experimental technologies [4]. So better computational techniques are quit desirable for predicting protein structure from primary sequences information. Therefore obtaining an accurate prediction of the three-dimensional structure of a protein by automatic prediction is one of the most important problems in bioinformatics and has been actively researched for many years.

The CASP is a biannual world wild contest in the structure prediction community to assess the current protein modeling techniques and identify their quality. There are different groups trying to predict structure of a protein whose structure is unreleased to the outside world or refine an existing model close to native structure. Evaluating the models’ quality and refining the existing models are the two main problems of predicting the three dimensional protein models.

Model refinement is a step in the prediction process, in which improved structures are constructed based on a pool of initially generated models. The refinement category has been added to the CASP competition since 2008 to evaluate the current technology for further improvement to the best predicted models. Participants can focus on the best submitted protein models, sometimes the certain part of a protein needs to refinement or the accuracy of the protein models are provided. The aim of protein refinement is to move the backbone or the side chain conformations closer to native. Several methods were explored and software packages were developed to improve the quality of protein structure models. i3Drefine software is the only fully-automated server which can improve the global and local structure quality in CASP10. This new technology minimizes the energy iteratively with physics and knowledge-based force fields, and hydrogen bonding (HB) network optimization technique [5]. GalaxyRefine has the best performance in CASP10 by improving the local structure quality. It rebuilt the side chains, repacked them, and relaxed the structure by molecular dynamics simulation [6]. KoBaMIN is another method based on minimization of a knowledge-based potential of mean force [7]. I-TASSER is an automated pipeline for predicting protein 3D structure by multiple threading alignments and iterative structure assembly simulations [8]. Methods mentioned in references [9] and [10] also can improve the initial model structure.

Although a number of software packages have been developed and they sometimes perform well on template-based modeling, further improvement is needed for practical use. During CASP8 and CASP9 [11] [12], only a few groups were able to improve the protein model quality consistently. In CASP10, seven groups were better than naïve (null) method, only two or three groups were significantly better. Most groups are still unable to improve the quality of the starting models; even high-performing groups were only able to make modest improvements; and no predicted models were more similar to the native structure than to the starting model. The maximum improvement in high accuracy of GDT-TS (GDT-HA) is only about 0.1 [13].

In this project, three new evolutionary algorithms which are MDS-based, MODELLER-based and a hybrid of MDS-based and MODELLER-based crossover have been developed. The MDS-based method is a purely geometrical approach that combines the contact maps of multiple parents to breed a new offspring. The MODELLER-based method takes a statistical and energy minimization approach and uses the remodeling module in MODELLER program to generate new models from multiple parents. The hybrid method first generates models using the MDS-based method and then runs them through the MODELLER-based method, aiming at combining the strength of both.

This thesis is started with some related work for this project of protein tertiary structure refinement in Chapter 2. Chapter 3 presents the three evolutionary algorithms to refine protein models and Chapter 4 describes the refinement results. Some conclusions and future work are shown in Chapter 5.

Chapter 2: RELATED WORK2.1 Evolutionary Algorithms

Evolutionary Algorithms simulate the natural evolution progress [14]. This computational method is based on evolutionary and biological principles to achieve optimal results. The size of the population is invariable by filtering the reproduction and selecting the better survivals. This evolutionary algorithm uses a fitness function that represents the quality of the population as a number. Several evolutionary algorithms are used for solving the protein model predication problems [15] [16] [17] [18]. Figure 2.1 shows the general process of an evolutionary algorithm.

Step 1. Initial Population

Generate the initial population randomly

Step 2. Fitness Function Evaluation

Evaluate the fitness of each individual in the pool

Step 3. Repeat Generation until Termination

Step 3.1. Selection

Select the best-fit individuals as parents

Step 3.2. Reproduction

Breed new generation by crossover or mutation

Step 3.3. Evaluation

Evaluate the individual’s fitness of new generation

Step 3.4. Replacement

Knock out least-fit individuals

Figure 2. 1 Implementation of evolutionary algorithm

2.2 Protein Quality Evaluation

Picking the most suitable candidates from the template is essential to model refinement. The basic hypothesis of protein models is that the native structure has the minimum free energy in general [1]. Using energy or scoring functions is a most adoptable approach for evaluating the models quality. An accurate energy function could help to find the best prediction protein models. There are two general categories of energy functions: one is physics-based potential and the other is knowledge-based potential [19]. Physics-based potential function uses physics law to evaluate the models’ quality. The knowledge-based statistical method is based on the thermodynamic equilibrium and known information of protein models. The second approach is clustering based. This theory assumes that good models have more structural neighbors. The third approach is using consensus information. The major selection is based on pairwise similarity between structures [20].

2.2.1 Similarity Consensus

The pairwise similarity could be retrieved from the root-mean-squared deviation (RMSD), the template Modeling Score (TM-score) and the total score of global distance test (GDT-TS) between each protein model pairs etc.[2].

RMSD is the measure of average distance between the C-α atoms of two protein models. RMSD is commonly used in the study of protein conformations; it measures the similarity in three-dimensional structure of the C-α atomic coordinates. Low RMSD score indicates two protein models are similar. Given two sets of n points v and w, the RMSD is defined as follows by Eq. (1):

(1)

TM-score [21] [22] is an algorithm to measure the similarity of topologies between two protein models. Because TM-score considers the close matches more than the distant matches, TM-score is more accurate than RMSD. The range of TM-score is (0,1] for each pairs comparison, where 1 means a perfect match between two protein models.

GDT-TS [23] is a global quality measure of the correct positioning of amino acid sequences between two protein models. It is the most commonly parameter used to evaluate the quality of a predicted protein model with the experimentally determined structure. It is also widely accepted as a quality assessment of protein models prediction tool in CASP competition. The GDT-TS score is more sensitive to outlier regions of a protein model, so it is more accurate than RMSD score. The GDT-TS score is calculated by averaging percentage of residues with C-α atom distance in the model structure within certain distance cutoff of their positions. GDT-TS ranges from 0 to 1 with higher value indicating better accuracy. GDT-HA is a high accuracy version of the GDT score [24]. It is more rigorous and uses smaller cutoff distance, only half the size of GDT-TS.

GDT-TS is used as major assessment criteria in CASP competition to compare the predicted model with native structure. The GDT is defined as follow:

(2)

Where and are two protein 3D structures and is the percentage of amino acid residues' alpha carbon atoms from that can be superimposed with corresponding residues from within a defined distance cutoff , [20].

However, in the refinement procedure, the knowledge of native structure is not exposed. For predicted model’s evaluation, most groups used consensus GDT-TS (CGDT-TS) ,which considers the similarity between each pair of models. Given a set of prediction structures set S and a reference structure set R, the CGDT-TS score for each prediction structure Si is defined as:

(3)

The reference structure set R can be the same structure set as S. The Eq. (2) and (3) indicates that the range of CGDT-TS is [0, 1]. The larger the CGDT score, the more similarity between two protein models. In this thesis, TM-score software developed by Yang Zhang’s group is used to compute the CGDT-TS [21].

2.2.2 Energy or scoring functions

OPUS_Cα [25] is a knowledge-based potential function, only using the information of C-α position. This software is based on seven major representative molecular interactions in proteins: distance-dependent pairwise energy with orientation preference, hydrogen bonding energy, short-range energy, packing energy, tri-peptide packing energy, three-body energy, and salvation energy. dDFIRE [26] program treats each polar atom as a dipole and is based on the orientation angles in dipoles interactions and distance between two atoms dipoles. This approach considers the hydrogen bonding interaction as the physical dipole-dipole interaction. The possible orientation-dependent interactions between polar and nonpolar atoms and interactions between non-hydrogen-bonded polar atoms are treated fairly. This method produces an all-atom parameter-free statistical energy function. The calRW was developed by Zhang and Zhang’s group [27] by involving two functions: the first function is a pairwise distance-dependent atomic statistical potential function using an ideal random walk chain as reference state; the second function is using a side chain orientation-dependent energy function. GOAP [28] is a generalized orientation and distance-dependent all-atom statistical potential that is determined by the relative orientation of the planes, which relate on each heavy atom in interacting pairs. This algorithm only considers the distance and angle dependent between representative atoms or blocks of side-chain or polar atoms.

ProQ2 as the best single-model method in CSAP10 is a model quality assessment program, which could be used to evaluate the model’s quality or as a scoring function for sampling-based refinement. This model quality assessment algorithm uses support vector machines to validate each residue quality and the global quality of protein models. It combines previously used features with updated structural and predicted features to evaluate the predicted models [29].

For each individual residue, ProQ2 uses S-score which is the transformation of the normal RMSD to predict its quality. S-score is described by the following formula:

(4)

where RMSDi is the local RMSD deviation for residue i based on a global superposition trying to maximize essentially the sum of S-score over the whole model.

2.3 MDS

Proteins’ three dimensional structures determine the functions of a protein and amino acid sequence. A protein model can be presented as a result of spatial contacts between amino acids. Neighbors in sequence are not corresponding to the nearest neighbors in space. The contact map of a protein is a two dimensional matrix in which each value represents the distance between all amino acid residue pairs of the 3D protein models. For a protein with n residues, the contact map is n × n symmetrical matrix, the distance of ( i , j ) is range from 0 to 1. “1” means to residue i and j are close and “0” means far. In certain conditions the contact map can reconstruct the 3D coordinates of a protein [30]. Contact map predictions have been used in the modeling of protein 3D structures [31] [32] [33] [34].

Metric multi-dimensional scaling (MDS) [35] [36] [37], also called Principal Coordinates Analysis (PCO), is a dimensional reduction technology which aims to identify patterns in a distance matrix. Using MDS, a 3D protein model could be constructed. Multi-dimensional scaling tries to find the low dimensional space to represent the distances between points in space that match the dissimilarities [38].

MDS can be used to compare orthologous sequence sets [39] and predict the protein models binding [40], achieve better, clash-free placement of loops obtained from a database of protein models [41].

2.4 MODELLER

MODELLER is a computer program for comparing protein models modeling by satisfaction of spatial restraints which are deduced from homology to template structures and energy objective functions [42] [43].

The basic inputs include an alignment of a sequence, the atomic coordinates of the templates and a script file. MODELLER calculates many distances and angle restraints from the alignment with the template tertiary structure and generates a model containing all non-hydrogen atoms within minutes. Except model building, MODELLER could be used in fold assignment, multiple alignments of protein sequences and calculation of phylogenetic trees, etc. [44].

The spatial restraints [44] include homology-derived restraints, stereo chemical restrains, statistical preferences for dihedral angles and non-bonded inter-atomic distances, and optional manually curated restraints. Those restrains presented as probability density functions are optimized by a combination of conjugate gradients and molecular dynamics with simulated annealing considering them altogether as a function.

Chapter 3: THREE NEW EVOLUTIONARY ALGORITHMS FOR PROTEIN MODEL REFINEMNT

The evolutionary algorithm begins by creating a population of individuals, where the user defines the number of individuals in the population. The creation of the initial population is followed by loop over generations, repeating until the maximum number of generations is attained. A generation consists of three stages: protein model quality evaluation, selection, and crossover.

The following Fig. 3.1 shows the flow chart for the whole process.

Figure 3. 1 The evolutionary algorithm of the whole process

3.1 Initial Population

More than 100 research groups participated in CASP competition all over the world. First of all, the host sends out the identified sequences to each group; then each group uses different algorithms to generate the 3D structure partially or fully. Finally, at most, five decoys can be submitted to the CASP center. There are about 250 prediction structures for each protein target.

Some predicted structures generated from the same group or the similar algorithms are nearly identical to each other. For the diversity of the original pool, those similar models will be removed before calculating. In the previous study of our research group [45], if the similarity between two models is equal to 1 by computing the consensus GDT-TS scores, they will be considered to be similar and will discard one of them from the initial pool.

The original population pool contains protein models with different sequence lengths. Decoys with the same sequence length will generate the same size of the contact map and weights matrixes. For this experiment, we only consider protein models with the same length as native sequence and remove the too long or too short sequence for each target.

After removing redundant structures and partial length structures, we pick the best 200 models as our initial population. But actually, after rigorous screening, for some targets, the size of initial pool may be less than 200. The smallest population for this experiment is 164 for target T0711. After the first iteration, howover, the size of population would be kept as 200.

3.2 Evaluation

Good model quality assessment methods are essential and crucial for the end users of a model. They could contribute to the improvement of predicted models as well as the selection of the most likely model in a population. ProQ2 as a new single scoring function method has good performance in CASP10 competition, which takes one single structure as input and assigns a score. Consensus GDT-TS was widely used in protein models prediction, which assigns a score to each model as its average structural similarity (GDT score) to all other models in the set. If most predicted protein models have good quality for one protein model, the consensus GDT-TS could find the best predicted model for this target. We use TM-score program to get the pairwise similarity value between pairs. It can be used to calculate the GDT-TS score and RMSD value and evaluate the quality of predicted protein models comparing to native one.

In this experiment, single scoring function ProQ2 and consensus based GDT-TS involved as the protein quality assessment methods.

Figures 3.2 and 3.3 show examples of initial population for T0654.

Figure 3. 2 ProQ2 and GDT-TS of initial population for T0654

Figure 3. 3 CGDT-TS and GDT-TS of initial population for T0654

3.3 Selection

In the evolutionary algorithm, roulette wheel selection is the most widely used selection method. This selection is based on the probability of individual, which is proportional to their fitness values. The probability segments of individuals constituted a roulette wheel; the larger the segment size is, the more chances to be chosen. The selection probability Pi for individual i defined as follow:

(5)

where f1, f2,…,fn are the fitness values of individual 1, 2,…, n.

The roulette wheel selection gives a chance to all individuals. Therefore, it can keep the diversity in the population [46]. In this experiment, we chose 100 pairs as parents set. Figures 3.4 and 3.5 show the selection probability for T0654.

Figure 3. 4 Selection probability of ProQ2 for T0654

Figure 3. 5 Selection probability of CGDT-TS for T0654

3.4 Crossover

The parents are chosen by employing roulette wheel selection. Then the following methods are used to combine them together.

3.4.1 MDS-based method

The contact map of protein models reflects the distance relationship of each residue. Because the protein models are not always accurate in some segments, we need to combine parts of contact map with added the weights on the contact map to breed new generation. The WMDS-based crossover algorithm is shown in Figure 3.6.

Figure 3. 6 The WMDS-based crossover (WMDS-C) algorithm

The distance between amino acids of a protein may be confident in some part, but not all of them. ProQ2 program can evaluate each residue of a protein. Those quality values are used to construct the weights matrix, and added to the distance matrix to refine the protein 3D structure. In the following part, four different weight matrices are constructed with qi representing the quality of residue i in a protein using ProQ2.

Weights matrix w1 shown by Eq. (6) is the multiply quality of each residue pairs.

(6)

Except for weight matrix w1, the following weight matrices are also tested, but the results are worse for those matrices which are unable to receive a ProQ2 score. That is because the formula of weight matrix w1 works as probability of distance between each pairs of amino acid in a protein models.

Weight matrix w2 shown as in Eq. (7) is the sum of quality of each residue pairs.

(7)

Weights matrix w3 as in equation (9) is the multiply of quality of each residue pairs firstly, divided by the sum of the quality of all residue pairs.

(8)

(9)

Weights matrix w4 as in Eq. (11) is the sum the quality of each residue pairs, divided by the sum of the quality of all residue pairs.

(10)

(11)

After selecting the parents, crossover is applied to generate the next generation. There are many methods to combine two different 3D structures. The better feature would be kept, and the bad ones are abandoned. The “Average” combine method is very simple and it averages the contact maps and weight matrix of parents. Average merges two contact maps into one.

By using MATLAB_mdscale (non-classical multidimensional scaling) to implement the weight metric multidimensional scaling (WMDS), we allow those weights to maximize the fit to the original distances. Points with large weights will have a stronger influence on the results. The MATLAB function ‘MDSCALE’ is used to generate a three dimensional matrix M. Non-classical multidimensional scaling determines the type of scaling, non-metric or metric. Since the RMSD formula is used to get the contact map for each 3D protein model, it should be metric. The corresponding parameters are “metricstress”, “metricsstress”, “sammon”, and “strain”. After testing the parameters, “metricstress” is stable and has better performance, so “metricstress” is chosen as the default parameter when running the WMDS by MATLAB. In metric scaling, “MDSCALE” wants to configure points whose pairwise Euclidean distances is equal to the dissimilarities approximately.

After that, a Perl script xyz2pdb is used to reconstruct the PBD file of a prediction model from the coordinates which are derived by MDS scaling.

3.4.2 MODELLER-based method

In the MODELLER-based refinement algorithm, the roulette wheel selection is employed to select three parents as the template structure firstly, and then we extracted their alignments file that has the input sequences aligned fully with the template sequences and 3D coordinates of templates. They are combined together with a script as the input of MODELLER, and the default “automodel” modeling protocol in MODELLER is used. In this way, MODELLER will refold the models based on the input and give birth to a baby. The default number of templates is five, but three parents are chosen as a pair template in the calculation. If the templates are not very good models in the prediction pool, the next generation are also not good models. Better remodeling models could be obtained if the models in the pool are good. The MODELLER-based crossover algorithm is shown in Figure 3.7.

Figure 3. 7 The MODELLER-based crossover (MODELLER-C) algorithm

3.4.3 Hybrid method

In this algorithm, in order to compare this three crossover algorithms, the same number of protein models in each generation are generated. Firstly, the above MSD- based algorithm is repeated three times to generate enough parents, and then 100 sets of three models will run MODELLER and refold as refinement.

After generating the new breeds, the new models will compete with the original pool of their parents, then using the selection methods to keep the size of the pool and prepare for the next generation. The steps of selection, crossover and evaluation will be run iteratively until the end. Figure 3.8 shows the hybrid algorithm.

Figure 3. 8 The Hybrid crossover (Hybrid-C) algorithm

Chapter 4: EXPERIMENTAL Results

This chapter presents the results from the algorithms that have been implemented in accordance with the design in Chapter 3.

This project was written in Perl, MATLAB, C/C++ and ran on Red Hat Enterprise Linux Server 5.4 with 24 Intel Xeon(R) CPU X5660 processors running @ 2.80GHz, total 24GB memory installed, and Linux kernel at version 2.6.32-220.el6.x86_64. The MATLAB version is the newest MATLAB 8.2 with release name of R2013b.

4.1 Data Set

The whole system is using evolutionary algorithm to refine models iteratively. In this paper, the targets sequence information and templates are selected from CASP competition as test samples. Sixteen CSAP10 targets were used as experiment objects to generate the initial population for each target. At each time, an individual’s ProQ2 score or Consensus GDT-TS score is calculated, and the best predicted protein models was chosen in certain size of each target. After this, a model will be selected to reproduce by roulette wheel selection. Crossover is performed on two or three members of the population. The optional crossover refinement algorithm includes WMDS-based, MODELLER-based, and hybrid of WMDS-based and MODELLER-based methods. The evolutionary algorithm iterates over generations until the termination.

Table 4. 1 Summary of CASP10 refinement targets

Target

Residues

Average CGDT-TS

Best GDT-TS

Best RMSD

T0648

102

0.5663

0.8227

3.494

T0654

166

0.5822

0.7682

2.542

T0657

154

0.6325

0.8703

2.254

T0659

85

0.6347

0.9696

0.861

T0662

79

0.7473

0.8882

1.481

T0665

67

0.2395

0.9948

0.626

T0668

86

0.6501

0.4295

7.532

T0669

109

0.4613

0.683

2.908

T0673

88

0.2566

0.6532

5.349

T0675

75

0.4423

0.6096

4.4

T0678

161

0.2163

0.4172

6.28

T0680

119

0.2730

0.763

3.298

T0696

111

0.5571

0.7075

3.453

T0698

119

0.5391

0.6471

3.916

T0700

86

0.3450

0.9643

0.802

T0709

33

0.6516

0.9896

0.685

The 3rd, 4th and 5th columns are the CGDT-TS of whole decoys for each target, the GDT-TS and RMSD of the best prediction model.

4.2 Computation Time

Figure 4.1 compares the computation time for different algorithms. In the figure, P and C stand for ProQ2 and Consensus GDT-TS, respectively, in the quality evaluation part. 2P and 3P stand for two or three parents in crossover, respectively. In crossover, there are three methods: WMD-based method, MODERLLER-based method, and combined WMDS and MODELLER-based method. For example, P-2P-WMDS means using ProQ2 program to evaluate the protein quality, selecting two models as parents and using WMDS-based crossover. C-2P-WMDS uses consensus GDT-TS as the evaluation method compared to P-2P-WMDS.

As shown in Figure 4.1, the computation time for WMDS-based refinement is shorter than MODELLER-based refinement. Two or three parents’ combination within WMDS-based refinement did not make a significant difference in the crossover section because the geometry computation runs very fast on the Linux system. The overall computation time for hybrid refinement method is not equal to the sum of three times computation time for running WMDS-based and MODELLER-based refinement.

Figure 4. 1 The all computation time for different algorithms

Figure 4.2 shows a summary result for the computation time in each step for different algorithms. The computation time for MODELLER-based refinement using ProQ2 as QA method over WMDS-based refinement is 5.9, the ratio is 2.5 when using CGDT-TS as QA method. CGDT-TS selection method spent more than 8 minutes to assess the whole data set and prepared for the next generation in iteration, while ProQ2 selection method takes less than 2 minutes. ProQ2 as one single model method was running very fast which the predicted models had the same sequence length and used the same base name file. Because CGDT-TS compared each pairs of prediction models in the population, the computation time higher than Proq2 evaluation method. For the same quality assessment method, because the size of population is the same, the computation times for predicted protein models evaluation are very similar. In the step of selection, the implement time is very short.

Figure 4. 2 The average iteration time in each step for different algorithms

Figure 4.3 shows the computation time for WMDS-based refinement. In the WMDS-based crossover method, most computation time was spent on running the WMDS in MATLAB program which spent more than 1 minute to get the coordinates of 100 models.

Figure 4. 3 The computation time for steps in WMDS-based EA of each iteration

4.3 Results4.3.1 Refinement results using ProQ2 as quality assessment method

In Table 4.2, the best final models produced by evolutionary algorithms were selected based on the true GDT-TS against the native experimental structure using ProQ2 as quality assessment method. Figures 4.4 and 4.5 show the GDT-TS and RMSD of the best models obtained by ProQ2 evaluation, respectively.

P-2P-WMDS refinement algorithm is using ProQ2 as protein model quality assessment method and WMDS-based crossover method. It could get better models with higher true GDT-TS in 9 out of 16 and best 5 target models didn’t change during the refinement. For target T0680, the true GDT-TS was even increased from 0.763 to 0.8333 and its RMSD was decreased from 3.298 to 2.502. P-3P-WMDS chooses three models as parents and uses WMDS-based crossover. It didn’t improve the best model quality and even got worse model for target T0673. In MODELLER-based refinement, the maximum improvement occurred on target T0662 and its true GDTS-TS was improved 0.046. 8 out of 16 GST-TS of the best models was improved but RMSD didn’t change with 2 of 16 decreased. P-2P-WMDS-MODELLER combined WMSD-based and MODELLER-based refinement and it could also improve the best model’s quality. The biggest improvement happened to target T0669 and its GDT-TS increased by 0.0412. The RMSD of the best models had been improved by crossover evolutionary algorithm but their GDT-TS were not increased. The predicted model quality didn’t relate on its geometry similarity with the native structure.

Table 4. 2 Summary of refinement results on CASP10 targets using ProQ2

Target

Initial Best

P-2P-WMDS

P-3P-WMDS

P-3P-MODELLER

P-2P-WMDS-

MODELLER

GDT-TS

RMSD

GDT-TS

RMSD

GDT-TS

RMSD

GDT-TS

RMSD

GDT-TS

RMSD

T0648

0.8227

3.494

0.8343

3.494

0.8227

3.494

0.8314

3.465

0.843

2.936

T0654

0.7682

2.542

0.7682

2.542

0.7682

2.542

0.7646

3.064

0.7518

2.817

T0657

0.8703

2.254

0.8741

2.218

0.8703

2.254

0.8759

2.194

0.8759

2.229

T0659

0.9696

0.861

0.973

0.8

0.9696

0.861

0.9696

0.797

0.9696

0.818

T0662

0.8882

1.481

0.9178

1.433

0.8882

1.481

0.9342

1.316

0.9013

1.453

T0665

0.9948

0.626

0.9948

0.626

0.9948

0.626

0.9948

0.671

0.9948

0.626

T0668

0.4295

7.532

0.4423

6.404

0.4295

7.532

0.4295

7.513

0.4423

6.277

T0669

0.683

2.908

0.7036

2.876

0.683

2.908

0.7165

2.85

0.7242

2.908

T0673

0.6532

5.349

0.6532

5.349

0.4395

11.06

0.6532

5.302

0.6532

5.349

T0675

0.6096

4.4

0.5965

4.509

0.6096

4.4

0.5789

5.048

0.5965

4.509

T0678

0.4172

6.28

0.4318

6.139

0.4172

6.28

0.4221

7.054

0.4172

6.28

T0680

0.763

3.298

0.8333

2.502

0.763

3.298

0.763

3.944

0.763

3.298

T0696

0.7075

3.453

0.705

3.453

0.7075

3.453

0.7475

3.892

0.705

4.127

T0698

0.6471

3.916

0.6702

3.982

0.6471

3.916

0.6723

4.019

0.6618

3.982

T0700

0.9643

0.802

0.9643

0.802

0.9643

0.802

0.8714

1.313

0.8429

1.641

T0709

0.9896

0.685

0.9896

0.694

0.9896

0.685

1

0.51

0.9792

0.768

Figure 4. 4 The GDT-TS of best model using ProQ2 evaluation

Figure 4. 5 The RMSD of best model using ProQ2 evaluation

Figures 4.6 and 4.7 show the average GDT-TS of best 10 and all models using ProQ2 evaluation method, respectively. Only 3 of best 10 models’ average GDT-TS were not improved in this experiment by P-2P-WMDS evolutionary algorithm. For target T0698, P-2P-WMDS, P-3P-MODELLER, P-2P-WMDS-MODELLER methods raised the average GDT-TS of best 10 models to 0.0427, 0.0501 and 0.0358, respectively. The average quality of total generated models was improved except P-3P-WMDS EA method. From Fig. 4.7, the average GDT-TS of all models went up 0.1474, 0.1645 and 0.1534, respectively for target T0657.

Figure 4. 6 The average GDT-TS of best 10 models using ProQ2 evaluation

P-3P-WMDS method couldn’t achieve better models sometimes and even got worse models. This may be due to involvement of too many parents that make a mess of the constructed contact maps and weight matrixes. P-2P-WMDS utilized the geometry benefits of both parents and added the appropriate probability weights which are generated by ProQ2 program to improve the predicted model quality.

Figure 4. 7 The average GDT-TS of all models using ProQ2 evaluation

4.3.2 Refinement results using CGDT-TS as quality assessment method

Table 4.3 describes the results of EAs on CASP10 refinement target by CGDT-TS evaluation.

According to the Table 4.3, MDS-based and MODELLER-based refinement algorithms using CGDT-TS evaluation method failed to achieve better predicted models. The reason for this is that CGDT-TS compared all individual models and similar models could get higher score, using CGDT-TS evaluation the number of generated offspring is too big and most of them are similar and has poor quality, so it can’t keep the good predicted protein models in iteration of the pool.

Table 4. 3 Summary of refinement results on CASP10 targets using CGDT-TS

Target

Initial Best

C-2P-WMDS

C-3P-WMDS

C-3P-MODELLER

C-2P-WMDS-

MODELLER

GDT-TS

RMSD

GDT-TS

RMSD

GDT-TS

RMSD

GDT-TS

RMSD

GDT-TS

RMSD

T0648

0.8227

3.494

0.811

4.474

0.314

7.408

0.8169

4.544

0.7965

5.017

T0654

0.7682

2.542

0.7427

3.259

0.2682

7.25

0.7518

3.383

0.7445

3.46

T0657

0.8703

2.254

0.8722

2.303

0.2556

7.339

0.8722

2.225

0.8609

2.319

T0659

0.9696

0.861

0.9561

0.855

0.3581

5.292

0.9561

0.923

0.9527

0.941

T0662

0.8882

1.481

0.8257

1.79

0.3388

5.848

0.8059

1.86

0.8224

1.788

T0665

0.9948

0.626

0.9583

0.837

0.4479

4.343

0.9635

0.827

0.9635

0.8

T0668

0.4295

7.532

0.391

7.049

0.266

9.055

0.4199

7.308

0.375

7.767

T0669

0.683

2.908

0.6649

4.2

0.2784

7.468

0.6495

4.154

0.6418

4.757

T0673

0.6532

5.349

0.3508

9.259

0.2177

13.395

0.4758

11.139

0.3508

10.399

T0675

0.6096

4.4

0.5

7.233

0.2588

11.147

0.5044

7.058

0.5132

7.07

T0678

0.4172

6.28

0.4026

6.479

0.0828

16.182

0.3782

7.72

0.3653

7.31

T0680

0.763

3.298

0.4766

12.546

0.763

3.298

0.5677

12.413

0.4688

11.485

T0696

0.7075

3.453

0.5025

12.041

0.2175

14.913

0.5075

11.946

0.5025

11.937

T0698

0.6471

3.916

0.6176

4.702

0.2773

7.494

0.5987

4.941

0.6324

4.722

T0700

0.9643

0.802

0.8786

1.448

0.3214

7.492

0.9786

0.793

0.8429

1.531

T0709

0.9896

0.685

0.9792

0.636

0.5521

3.162

0.9896

0.679

0.9688

0.852

Table 4.3 Summary of refinement results on CASP10 targets using CGDT-TS

Figures 4.8 and 4.9 are showing the GDT-TS and RMSD of the best models using CGDT-TS evaluation, respectively. Almost all the generated best protein model for each target with CGDT-TS evaluation and running for 10 iterations were worse than the initial best model in the template pool. The similar result shows in the Fig. 4.10 which is the average GDT-TS for best 10 models using CGDT-TS evaluation.

Figure 4. 8 The GDT-TS of best model using CGDT-TS evaluation

Figure 4. 9 The RMSD of best model using CGDT-TS evaluation

Figure 4. 10 The average GDT-TS of best 10 models using CGDT-TS evaluation

While it can be seen from Fig. 4.11 that the average GDT-TS for all models using CGDT-TS evaluation method is improved except C-3P-MODELLER evolutionary algorithm.

Figure 4. 11 The average GDT-TS of all models using CGDT-TS evaluation

Figure 4.12 shows the average improved GDT-TS for different algorithms. Except choosing three models as parents with WMDS-based crossover, overall predicted models in the population are improved. Many evidence shows that more than two parents can increase the performance of EAs [47]. But studies on different methods and different types of fitness should be considered, in this experiment, choosing three models as parents failed to reach good results. For the top 1 and top10 models, the average improved GDT-TS were not significant.

Figure 4. 12 The average improved GDT-TS for different algorithms

Despite not being conclusive, the study indicated that using MDS-based crossover method to refold existing predicted models can be a promising approach to improve the best predicted protein models’ quality.

Chapter 5: CONCLUSIONS AND FUTURE WORK

The current project applied evolutionary algorithm framework and implemented WMDS-based, MODELLER-based, hybrid of MDS-based and MODELLER-based crossover algorithms to refine the predicted protein model structures. The performance of the evolutionary algorithm is compared based on the ProQ2 and CGDT-TS evaluation.

In WMDS-based crossover algorithm, choosing three models as parents is hard to get better models. Using CGDT-TS as QA method replaced the last population too fast with bad models that couldn’t improve the overall model quality. In this method, using ProQ2 as QA method and choosing two models as parents could achieve better predicted models.

In MODELLER-based crossover algorithm, the average computation time for each target is more than 3.5 hours. It could achieve better models in some cases.

The hybrid method could attain few better models sometimes, but it can’t improve the GDT-TS for each target on top1 model and the average GDT-TS of top10 models.

Comparing those three EAs, the computation time for MODELLER-based method are slow than WMDS-based method; using ProQ2 method to evaluate the models’ quality is fast than CGDT-TS method; except choosing three models as parents, all methods could improve the overall quality of the population; only P-2P-WMDS method could improve the GDT-TS of each target on top1, the average GDT-TS of top 10 and all population.

The performance of WMDS-based method with three parents is very bad and the reason are still not clear, so multi-parents for WMDS-based crossover algorithm needs more research on it.

In the CASP refinement problem, the best starting model will be provided, but this experiment only consider the whole population for each target, more experiment should be done for the next CASP competition.

In this study, the global ProQ2 score was used as protein quality assessment method and the local ProQ2 score was used to construct the weight matrices. For further study, other promising program such as IDDT [48] could be used in this evolutionary algorithm to see whether better results could be obtained.

Another problem is how to pick the best model from the finial population using better quality assessment method. Evolutionary algorithm as a stochastic searching process could get the similar performance for different runs. The final results may be different and the exactly same results couldn’t be obtained in this experiment.

REFERENCES

[1] C. Anfinsen, "Principles that govern the folding of protein chains," Science, vol. 181, pp. 223-230, 1973. [2] C. Anjum A, "Protein Tertiary Model Assessment Using Granular," 2012.[3] E. Reynaud, "Protein Misfolding and Degenerative Diseases," Nature Education, vol. 3, no. 9, p. 28, 2010. [4] M.S. Johnson, N. Srinivasan, R.Sowdhamini, TL.Blundell, "Knowledge-based protein modeling," Crit Rev Biochem Mol Biol, vol. 29, pp. 1-68, 1994. [5] D. Bhattacharya, J.Cheng, "i3Drefine Software for Protein 3D Structure Refinement and Its Assessment in CASP10," PLoS ONE, vol. 8, no. 7, p. e69648, 2013. [6] L. Heo, H. Park and C. Seok, "GalaxyRefine: protein structure refinement driven by side-chain repacking," Nucleic Acids Research, vol. 41, pp. 384-388, 2013. [7] João P. G. L. M.Rodrigues,M.Levitt and G.Chopra, "KoBaMIN: a knowledge-based minimization web server for protein structure refinement," Nucleic Acids Research, vol. 40, pp. 323-328, 2012. [8] D. Xu, J. Zhang, A. Roy, and Y. Zhang, "Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based structure refinement," Proteins, vol. 79(Suppl 10), pp. 147-160, 2011. [9] H. Park and C. Seok, "Refinement of unreliable local regions in template-based protein models," Proteins, vol. 80, pp. 1974-1986, 2012. [10] H.Park,J.Ko, K.Joo, J.Lee, C. Seok, and J. Lee, "Refinement of protein termini in template-based modeling using conformational space annealing," Proteins, vol. 79, pp. 2725-2734, 2011. [11] J.L.MacCallum,L. Hua,M.J. Schnieders,V.S. Pande,M.P.Jacobson and K.A.Dill, "Assessment of the protein structure refinement category in CASP8," Protein, vol. 77(Suppl.9), pp. 66-80, 2009. [12] J.L.MacCallum,A.Perez, M.J.Schnieders,L.Hua,M.P.Jacobson, and K.A.Dill, "Assessment of protein structure refinement in CASP9," Proteins, vol. 79(Suppl.10), pp. 74-90, 2011. [13] T.Nugent,D.Cozzetto and D.T.Jones, "Evaluation of predictions in the CASP10 Model Refinement Category(Accepted Article)," Proteins:Structure,Function,and Bioinformatics, 2013. [14] J. Holland, "Adaptation in Natural and Artificial Systems," in University of Michigan Press, Michigan, 1975. [15] N.Mansour, F.Kanj,H.Khachfe, "Evolutionary Algorithm for Protein Structure Prediction," in Natural Computation (ICNC), 2010 Sixth International Conference, Yantai,ShanDong, 2010. [16] A. Warmflash, P. Francois and E.D. Siggia, "Pareto evolution of gene networks: an algorithm to optimize multiple fitness objectives," Physucal Biology, vol. 9, no. 5, 2012. [17] Y.Zhang, L.Wu, "Artificial Bee Colony for Two Dimensional Protein Folding," Advances in Electrical Engineering Systems, vol. 1, no. 1, pp. 19-23, 2012. [18] MT.Oakley, EG.Richardson, H.Carr, RL.Johnston, "Protein Structure Optimisation With a "Lamarckian" Ant Colony Algorithm," IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM, 2013. [19] J. Skolnick, "In quest of an empirical potential for protein structure," Curr Opin Struct Biol, vol. 16, pp. 166-171, 2006. [20] Q. Wang, Y.Shang, and D. Xu, "Improving a Consensus Approach for Protein Structure Selection by Removing Redundancy," IEEE/ACM transactions on computational biology and bioinformatics, vol. 8, no. 6, pp. 1708-1715, 2011. [21] Y.Zhang and J.Skolnick, "Scoring function for automated assessment of protein structure template quality," Proteins, vol. 57, no. 4, pp. 702-710, 2004. [22] Y.Zhang and J.Skolnick, "TM-align: a protein structure alignment algorithm based on the TM-score," Nucleic Acids Res, vol. 33, no. 7, pp. 2302-2309, 2005. [23] A. Zemla, "LGA: a method for finding 3D similarities in protein structures," Nucleic acids research, vol. 31, pp. 3370-3374, 2003. [24] R.J. Randy and G. Chavali, "Assessment of CASP7 predictions in the high accuracy template-based modeling category," Proteins, vol. 69, no. s8, pp. 27-37, 2007. [25] Y. Wu,M. Lu,M. Chen,J. Li and J. Ma, "OPUS-Ca: A knowledge-based potential function requiring only Ca positions," Protein Science, vol. 16, no. 7, pp. 1449-1463, 2007. [26] Y. Yang, Y. Zhou, "Specific interactions for ab initio folding of protein terminal regions with secondary structures," Proteins, vol. 72, pp. 793-803, 2008. [27] J.Zhang,Y.Zhang, "A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction," PLoS ONE, vol. 5, no. 10, pp. 1-13, 2010. [28] H. Zhou and J. Skolnick, "GOAP: A Generalized Orientation-Dependent, All-Atom Statistical Potential for Protein Structure Prediction," Biophysical Joural, vol. 101, pp. 2043-2052, 2011. [29] A. Ray, E. Lindahl and B. Wallner, "Improved model quality assessment using ProQ2," BMC Bioinformatics, vol. 13, p. 224, 2012. [30] M.Vassura,L. Margara,P. Di Lena, F.Medri,P.Fariselli and R.Casadio, "Reconstruction of 3D Structures From Protein Contact Maps," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 5, no. 3, pp. 357-367, 2008. [31] A.N. Tegge, Z. Wang, J. Eickholt, and J. Cheng, "NNcon: Improved Protein Contact Map Prediction Using 2D-Recursive Neural Networks," Nucleic Acids Research, vol. 37, pp. w515-w518, 2009. [32] J. Bacardit, P. Widera,A. Márquez-Chamorro,F.Divina,J. S. Aguilar-Ruiz,and N.Krasnogor, "Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features," Bioinformatics, vol. 28, no. 19, pp. 2441-2448, 2012. [33] Z.Wang and J. Xu, "Predicting protein contact map using evolutionary and physical constraints by integer programming (extended version)," Bioinformatics, vol. 29, no. 13, pp. 266-273, 2013. [34] W.Ding, J.Xie,D.Dai,H.Zhang,H.Xie and W.Zhang, "CNNcon: Improved Protein Contact Maps Prediction Using Cascaded Neural Networks," PLoS ONE, vol. 8, no. 4, p. e61533, 2013. [35] W. Togerson, Theory and methods of scaling, New York: Wiley, 1958. [36] H. Abdi, "Metric multidimensional scaling. In: Salkind NJ, editor," in Encyclopedia of Measurement and Statistics, Thousand Oaks (CA), Sage, 2007, pp. 598-605.[37] Y.Takane,S.Jung and Y.Oshima-Takane, "Multidimensional scaling. In: Millsap R, Maydeu-Olivares A, editors," in Handbook of quantitative methods in psychology, London, Sage Publications, 2009, pp. 219-242.[38] F.Cox Trevor and A.A.Cox Michael, Multidimensional Scaling(Second Edition), Chapman and Hall/CRC, 2000. [39] J.Pelé,H.Abdi,M.Moreau,D.Thybert and M.Chabbert, "Multidimensional Scaling Reveals the Main Evolutionary Pathways of Class A G-Protein-Coupled Receptors," PLoS ONE, vol. 6, no. 4, p. e19094, 2011. [40] H.Mogi and Y-h.Taguchi, "Protein binding prediction using non-metric multidimensional scaling method," in Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference, Atlanta, GA, 2011. [41] D.Holtby, S.Cheng Li and M. Li, "LoopWeaver: Loop Modeling by the Weighted Scaling of Verified Proteins," Journal of Computational Bioligy, vol. 20, pp. 212-223, 2013. [42] A.Sali and TL. Blundell, "Comparative Protein Modeling by Satisfaction of Spatial Restraints," Journal of Molecular Biology, vol. 234, no. 3, pp. 779-815, 1993. [43] A. Fiser, R.K. Do, and A. Sali, "Modeling of loops in protein structures," Protein Science, vol. 9, pp. 1753-1773, 2000. [44] N. Eswar, M. A. Marti-Renom, B. Webb, M. S. Madhusudhan, D. Eramian, M. Shen, U. Pieper, A. Sali, "Comparative Protein Structure Modeling With MODELLER," Current Protocols in Bioinformatics, John Wiley & Sons, Inc., vol. Supplement 15, pp. 5.6.1-5.6.30, 2006. [45] Q.Wang, K. Vantasin,D.Xu,and Y. Shang, "MUFOLD-WQA: A new selective consensus method for quality assessment in protein structure prediction," Proteins, no. 79(Suppl 10), pp. 185-195, 2011. [46] M.R. Norainii,J. Geraghty, "Genetic Algorithm Performance with Different Selection Strategies in Solving TSP," 2011. [47] L. D. Chambers, Practical Handbook of Genetic Algorithms: Complex Coding Systems, Volume 3, CRC Press, 1998. [48] V. Mariani,M. Biasini,A.Barbato and T.Schwede, "IDDT:a local superposition-free score for comparing protein structures and models using distance difference tests," Bioinformatics, vol. 29, no. 21, pp. 2722-2728, 2013.

ProQ2 and GDT-TS of initial population for T0654

GDT-TS0.625211999999999990.561096000000000040.614352000000000010.755763999999999990.521280999999999990.653540000000000010.499931000000000010.578948000000000020.584830999999999990.604536999999999990.552692999999999990.550664999999999960.507280999999999980.583932000000000010.499790000000000010.511618000000000020.614866000000000020.690555000000000030.663722999999999950.4867630.498856000000000020.553737999999999950.507410000000000030.514562999999999990.505885999999999950.558486000000000040.513241999999999980.590451999999999980.617917999999999970.625797999999999970.642889999999999960.701297999999999980.6309650.626051000000000020.551177000000000030.566182999999999990.5605890.664494999999999950.480775999999999980.548737999999999950.604157999999999970.567860000000000030.444496999999999980.564595999999999990.497163999999999990.6172050.434495000000000020.6472580.610227999999999990.471509000000000010.424123000000000030.630282000000000010.662386999999999950.530638999999999970.4567290.539750999999999980.481016999999999970.472857000000000030.5680520.472121000000000010.566864000000000030.648642000000000050.632047000000000030.512595000000000020.624468000000000020.596458999999999960.6132010.478434000000000030.651753000000000030.649590.688536000000000040.378000999999999980.446409999999999970.533039999999999960.401071999999999980.537007000000000010.416949999999999990.660278000000000030.624931000000000010.522190000000000040.406378999999999990.713138999999999970.526228000000000030.6272780.627248999999999950.664582000000000010.432630000000000010.485702000000000020.588629999999999990.462961000000000010.584470999999999960.616612999999999970.635469999999999980.455477999999999990.574273000000000030.5303310.510684999999999940.4972240.596488000000000020.636766999999999970.456676000000000030.6319920.437186000000000020.396677999999999980.538480999999999990.612017000000000030.442614999999999980.458724999999999990.534756000000000010.559417999999999970.411716000000000030.557876999999999960.562443000000000030.521510000000000030.544792000000000050.417750999999999980.478360999999999980.4889250.510027000000000010.631535999999999990.658000999999999950.447502000000000010.468687999999999990.512279999999999960.496736000000000010.425748999999999990.451761000000000020.603323999999999970.629371000000000010.541452999999999960.5102390.486491000000000010.593432999999999990.521410999999999960.485412999999999980.447732000000000020.461444000000000020.489439999999999990.504604000000000050.398139999999999990.514583000000000010.4705820.495539999999999980.572242999999999950.467860.558020999999999990.5165440.430456000000000010.347986999999999990.433696000000000030.529444000000000030.532973999999999950.374018999999999990.471173000000000010.467708000000000010.614705999999999970.617010999999999980.652970000000000050.545286999999999970.535205000000000040.575861999999999990.617470999999999990.583590000000000050.615999999999999990.548714999999999950.574965999999999980.539421000000000040.539224000000000040.604984999999999990.5978040.574981999999999990.592914000000000050.589744000000000050.551719999999999990.567034999999999960.304819000000000010.363267999999999980.366236999999999980.350849000000000029.8017999999999994E-20.155300999999999990.2007090.728099999999999970.74090.733600000000000030.751800000000000020.760900000000000020.733600000000000030.717199999999999950.729899999999999990.726299999999999950.706200000000000050.728099999999999970.750.746399999999999950.728099999999999970.746399999999999950.748199999999999980.713500000000000020.726299999999999950.731800000000000010.753600000000000050.746399999999999950.715300000000000050.751800000000000020.757299999999999970.71170.709899999999999980.717199999999999950.706200000000000050.713500000000000020.724500000000000030.733600000000000030.717199999999999950.751800000000000020.724500000000000030.70260.737199999999999970.739099999999999980.726299999999999950.713500000000000020.71170.680699999999999970.695300000000000030.718999999999999970.680699999999999970.757299999999999970.68250.71170.678799999999999960.768199999999999990.70260.751800000000000020.689799999999999970.678799999999999960.755499999999999950.713500000000000020.695300000000000030.74090.753600000000000050.691599999999999990.706200000000000050.731800000000000010.677000000000000050.717199999999999950.704400000000000030.677000000000000050.686100000000000040.675200000000000020.713500000000000020.733600000000000030.713500000000000020.750.72080.71170.737199999999999970.713500000000000020.74090.757299999999999970.72080.704400000000000030.691599999999999990.713500000000000020.739099999999999980.704400000000000030.691599999999999990.713500000000000020.724500000000000030.697100000000000050.713500000000000020.693400000000000020.72080.715300000000000050.744500000000000050.687999999999999940.709899999999999980.671499999999999990.691599999999999990.689799999999999970.70260.698899999999999970.72080.704400000000000030.678799999999999960.678799999999999960.687999999999999940.686100000000000040.655100000000000020.666100000000000030.669699999999999960.656900000000000040.644199999999999990.687999999999999940.737199999999999970.704400000000000030.693400000000000020.671499999999999990.664200000000000010.671499999999999990.704400000000000030.678799999999999960.666100000000000030.675200000000000020.664200000000000010.649599999999999960.667900000000000050.700699999999999990.678799999999999960.691599999999999990.67340.662399999999999990.658800000000000050.655100000000000020.693400000000000020.669699999999999960.684300000000000020.697100000000000050.697100000000000050.686100000000000040.647800000000000040.658800000000000050.655100000000000020.678799999999999960.633199999999999990.622299999999999960.644199999999999990.616800000000000010.697100000000000050.622299999999999960.605800000000000010.602199999999999960.642299999999999980.649599999999999960.631399999999999960.613099999999999980.596700000000000010.605800000000000010.428800000000000010.405100000000000020.421499999999999990.357700000000000020.363099999999999980.363099999999999980.414200000000000010.361300000000000010.42340.368599999999999980.3120.313900000000000010.313900000000000010.31750.31750.30290.308400000000000010.308400000000000010.295599999999999970.291999999999999980.26460.155099999999999990.155099999999999990.155099999999999990.184299999999999990.191599999999999990.16420000000000001

ProQ2

GDT-TS

CGDT-TS and GDT-TS of initial population for T0654

GDT-TS0.664796999999999970.663507999999999990.666985999999999970.6654080.660626999999999960.662379999999999970.660475999999999950.658676000000000040.661432000000000020.659923000000000040.658213999999999970.656799000000000020.655259999999999950.656537999999999950.655413000000000020.655312000000000010.657503999999999980.655947999999999980.655849000000000020.654560000000000030.654136000000000050.654044999999999990.653204000000000010.651553999999999970.654379000000000040.652166000000000020.6523890.652445999999999970.653175999999999980.651386000000000020.651321000000000040.649383999999999960.6492040.650818000000000010.648526999999999960.649217999999999960.646105999999999960.647120999999999950.643121000000000050.642440000000000010.6448180.644530000000000050.641760.643124000000000030.640317000000000030.641109999999999960.639997000000000040.642955000000000050.639491999999999950.639349999999999970.639368000000000050.642256999999999970.641564000000000020.637778999999999980.636665999999999950.638843000000000050.636413000000000010.635587000000000010.637256999999999960.634897999999999960.635118999999999990.636568000000000020.634881000000000030.632546000000000050.634657000000000030.634187999999999970.634541000000000020.631920000000000040.630928000000000040.629923000000000010.629627000000000050.630577999999999970.629361999999999980.626554000000000060.628858000000000030.626232000000000010.629252999999999950.627394000000000010.629597999999999990.6301310.627118999999999980.6261660.629271000000000020.629692999999999950.629402000000000020.625020000000000020.625295999999999960.624009000000000040.6274670.623230999999999980.623939000000000020.623898999999999980.623650999999999960.620993000000000020.624669999999999950.618866000000000030.6176410.619454999999999980.621789999999999950.620812000000000030.617468000000000020.617534000000000030.6172010.616277999999999990.607635999999999950.606905999999999950.606049999999999980.603326999999999950.602682000000000050.601434000000000020.5980550.596882000000000020.593148000000000010.591567999999999980.584269999999999960.583115999999999970.5847040.588864000000000050.583806000000000050.583899999999999970.582431999999999950.581879999999999950.578389999999999960.583825999999999960.584100999999999980.581327000000000040.581158999999999980.576447999999999960.573559000000000040.570795000000000050.572281999999999960.570778000000000010.564817000000000010.567209999999999990.565586000000000030.564914999999999940.565262000000000040.564170999999999980.556061999999999950.551096000000000030.558814000000000030.5514270.546923999999999970.553835999999999990.544205000000000050.550601000000000010.544231000000000020.528208000000000010.528799000000000020.528614999999999950.525105999999999960.522641000000000020.508091999999999990.502692000000000030.486565000000000030.3269880.327077000000000010.326114999999999990.324460000000000030.322257999999999990.321892000000000010.319697999999999980.319525999999999980.317790999999999990.315433000000000020.262849999999999970.262392999999999990.262417999999999980.259788000000000020.2596330.254599000000000020.2544670.253803999999999970.253601999999999990.250512999999999990.233523000000000010.154989999999999990.154824999999999990.154790000000000010.1475050.145377000000000010.145467999999999990.728099999999999970.74090.733600000000000030.751800000000000020.760900000000000020.733600000000000030.717199999999999950.729899999999999990.726299999999999950.706200000000000050.728099999999999970.750.746399999999999950.728099999999999970.746399999999999950.748199999999999980.713500000000000020.726299999999999950.731800000000000010.753600000000000050.746399999999999950.715300000000000050.751800000000000020.757299999999999970.71170.709899999999999980.717199999999999950.706200000000000050.713500000000000020.724500000000000030.733600000000000030.717199999999999950.751800000000000020.724500000000000030.70260.737199999999999970.739099999999999980.726299999999999950.713500000000000020.71170.680699999999999970.695300000000000030.718999999999999970.680699999999999970.757299999999999970.68250.71170.678799999999999960.768199999999999990.70260.751800000000000020.689799999999999970.678799999999999960.755499999999999950.713500000000000020.695300000000000030.74090.753600000000000050.691599999999999990.706200000000000050.731800000000000010.677000000000000050.717199999999999950.704400000000000030.677000000000000050.686100000000000040.675200000000000020.713500000000000020.733600000000000030.713500000000000020.750.72080.71170.737199999999999970.713500000000000020.74090.757299999999999970.72080.704400000000000030.691599999999999990.713500000000000020.739099999999999980.704400000000000030.691599999999999990.713500000000000020.724500000000000030.697100000000000050.713500000000000020.693400000000000020.72080.715300000000000050.744500000000000050.687999999999999940.709899999999999980.671499999999999990.691599999999999990.689799999999999970.70260.698899999999999970.72080.704400000000000030.678799999999999960.678799999999999960.687999999999999940.686100000000000040.655100000000000020.666100000000000030.669699999999999960.656900000000000040.644199999999999990.687999999999999940.737199999999999970.704400000000000030.693400000000000020.671499999999999990.664200000000000010.671499999999999990.704400000000000030.678799999999999960.666100000000000030.675200000000000020.664200000000000010.649599999999999960.667900000000000050.700699999999999990.678799999999999960.691599999999999990.67340.662399999999999990.658800000000000050.655100000000000020.693400000000000020.669699999999999960.684300000000000020.697100000000000050.697100000000000050.686100000000000040.647800000000000040.658800000000000050.655100000000000020.678799999999999960.633199999999999990.622299999999999960.644199999999999990.616800000000000010.697100000000000050.622299999999999960.605800000000000010.602199999999999960.642299999999999980.649599999999999960.631399999999999960.613099999999999980.596700000000000010.605800000000000010.428800000000000010.405100000000000020.421499999999999990.357700000000000020.363099999999999980.363099999999999980.414200000000000010.361300000000000010.42340.368599999999999980.3120.313900000000000010.313900000000000010.31750.31750.30290.308400000000000010.308400000000000010.295599999999999970.291999999999999980.26460.155099999999999990.155099999999999990.155099999999999990.184299999999999990.191599999999999990.16420000000000001

CGDT-TS

GDT-TS

Selection probablity of ProQ2 for T0654

Probablity0.625211999999999990.561096000000000040.614352000000000010.755763999999999990.521280999999999990.653540000000000010.499931000000000010.578948000000000020.584830999999999990.604536999999999990.552692999999999990.550664999999999960.507280999999999980.583932000000000010.499790000000000010.511618000000000020.614866000000000020.690555000000000030.663722999999999950.4867630.498856000000000020.553737999999999950.507410000000000030.514562999999999990.505885999999999950.558486000000000040.513241999999999980.590451999999999980.617917999999999970.625797999999999970.642889999999999960.701297999999999980.6309650.626051000000000020.551177000000000030.566182999999999990.5605890.664494999999999950.480775999999999980.548737999999999950.604157999999999970.567860000000000030.444496999999999980.564595999999999990.497163999999999990.6172050.434495000000000020.6472580.610227999999999990.471509000000000010.424123000000000030.630282000000000010.662386999999999950.530638999999999970.4567290.539750999999999980.481016999999999970.472857000000000030.5680520.472121000000000010.566864000000000030.648642000000000050.632047000000000030.512595000000000020.624468000000000020.596458999999999960.6132010.478434000000000030.651753000000000030.649590.688536000000000040.378000999999999980.446409999999999970.533039999999999960.401071999999999980.537007000000000010.416949999999999990.660278000000000030.624931000000000010.522190000000000040.406378999999999990.713138999999999970.526228000000000030.6272780.627248999999999950.664582000000000010.432630000000000010.485702000000000020.588629999999999990.462961000000000010.584470999999999960.616612999999999970.635469999999999980.455477999999999990.574273000000000030.5303310.510684999999999940.4972240.596488000000000020.636766999999999970.456676000000000030.6319920.437186000000000020.396677999999999980.538480999999999990.612017000000000030.442614999999999980.458724999999999990.534756000000000010.559417999999999970.411716000000000030.557876999999999960.562443000000000030.521510000000000030.544792000000000050.417750999999999980.478360999999999980.4889250.510027000000000010.631535999999999990.658000999999999950.447502000000000010.468687999999999990.512279999999999960.496736000000000010.425748999999999990.451761000000000020.603323999999999970.629371000000000010.541452999999999960.5102390.486491000000000010.593432999999999990.521410999999999960.485412999999999980.447732000000000020.461444000000000020.489439999999999990.504604000000000050.398139999999999990.514583000000000010.4705820.495539999999999980.572242999999999950.467860.558020999999999990.5165440.430456000000000010.347986999999999990.433696000000000030.529444000000000030.532973999999999950.374018999999999990.471173000000000010.467708000000000010.614705999999999970.617010999999999980.652970000000000050.545286999999999970.535205000000000040.575861999999999990.617470999999999990.583590000000000050.615999999999999990.548714999999999950.574965999999999980.539421000000000040.539224000000000040.604984999999999990.5978040.574981999999999990.592914000000000050.589744000000000050.551719999999999990.567034999999999960.304819000000000010.363267999999999980.366236999999999980.350849000000000029.8017999999999994E-20.155300999999999990.2007096.4000000000000003E-35.8000000000000005E-36.2999999999999983E-37.8000000000000014E-35.3999999999999986E-36.6999999999999976E-35.1000000000000004E-36.0000000000000053E-35.9999999999999984E-36.1999999999999972E-35.7000000000000037E-35.6999999999999967E-35.1999999999999963E-36.0000000000000053E-35.1999999999999963E-35.2000000000000102E-36.399999999999989E-37.1000000000000091E-36.8000000000000005E-34.9999999999999906E-35.2000000000000102E-35.6999999999999829E-35.2000000000000102E-35.2999999999999992E-35.2000000000000102E-35.6999999999999829E-35.2999999999999992E-36.1000000000000221E-36.3E-36.499999999999978E-36.6000000000000225E-37.1999999999999842E-36.5000000000000058E-36.5000000000000058E-35.5999999999999939E-35.8999999999999886E-35.7000000000000106E-36.8999999999999895E-34.9000000000000155E-35.6999999999999829E-36.2000000000000111E-35.7999999999999996E-34.599999999999993E-35.8000000000000274E-35.0999999999999934E-36.3999999999999613E-34.500000000000004E-36.5999999999999948E-36.3000000000000278E-34.799999999999971E-34.400000000000015E-36.5000000000000058E-36.8000000000000282E-35.5000000000000049E-34.699999999999982E-35.5000000000000049E-35.0000000000000044E-34.8999999999999599E-35.8000000000000274E-34.9000000000000155E-35.7999999999999718E-36.6999999999999837E-36.5000000000000058E-35.2000000000000379E-36.5000000000000058E-36.0999999999999943E-36.2999999999999723E-35.0000000000000044E-36.6999999999999837E-36.5999999999999948E-37.1000000000000507E-33.8999999999999591E-34.599999999999993E-35.5000000000000049E-34.0999999999999925E-35.6000000000000494E-34.1999999999999815E-36.7999999999999727E-36.5000000000000058E-35.4000000000000159E-34.0999999999999925E-37.4000000000000177E-35.4000000000000159E-36.3999999999999613E-36.5000000000000058E-36.8000000000000282E-34.4999999999999485E-35.0000000000000044E-36.0999999999999943E-34.7000000000000375E-36.0000000000000053E-36.3999999999999613E-36.5000000000000613E-34.6999999999999265E-35.9000000000000163E-35.5000000000000604E-35.1999999999999824E-35.0999999999999934E-36.1999999999999833E-36.4999999999999503E-34.7000000000000375E-36.5000000000000613E-34.4999999999999485E-34.0999999999999925E-35.6000000000000494E-36.2999999999999723E-34.4999999999999485E-34.7000000000000375E-35.5000000000000604E-35.7999999999999163E-34.1999999999999815E-35.8000000000000274E-35.8000000000000274E-35.2999999999999714E-35.6000000000000494E-34.2999999999999705E-35.0000000000000044E-35.0000000000000044E-35.2999999999999714E-36.5000000000000613E-36.6999999999999282E-34.6000000000000485E-34.9000000000000155E-35.1999999999999824E-35.0999999999999934E-34.3999999999999595E-34.7000000000000375E-36.1999999999999833E-36.5000000000000613E-35.4999999999999494E-35.2999999999999714E-35.0000000000000044E-36.0999999999999943E-35.4000000000000714E-35.0000000000000044E-34.5999999999999375E-34.7000000000000375E-35.0999999999999934E-35.0999999999999934E-34.0999999999999925E-35.2999999999999714E-34.9000000000000155E-35.0999999999999934E-35.9000000000000163E-34.8000000000000265E-35.6999999999999273E-35.4000000000000714E-34.3999999999999595E-33.6000000000000476E-34.3999999999999595E-35.5000000000000604E-35.4999999999999494E-33.8000000000000256E-34.9000000000000155E-34.7999999999999154E-36.3000000000000833E-36.3999999999999613E-36.7000000000000393E-35.5999999999999384E-35.5000000000000604E-35.9000000000000163E-36.3999999999999613E-36.0000000000000053E-36.2999999999999723E-35.7000000000000384E-35.9000000000000163E-35.4999999999999494E-35.6000000000000494E-36.1999999999999833E-36.1999999999999833E-35.9000000000000163E-36.0999999999999943E-36.0999999999999943E-35.5999999999999384E-35.9000000000000163E-33.0999999999999917E-33.7000000000000366E-33.8000000000000256E-33.5999999999999366E-31.0000000000000009E-31.6000000000000458E-32.0999999999999908E-3

ProQ2

Probality

Selection probablity of CGDT-TS for T0654

Probablity0.664796999999999970.663507999999999990.666985999999999970.6654080.660626999999999960.662379999999999970.660475999999999950.658676000000000040.661432000000000020.659923000000000040.658213999999999970.656799000000000020.655259999999999950.656537999999999950.655413000000000020.655312000000000010.657503999999999980.655947999999999980.655849000000000020.654560000000000030.654136000000000050.654044999999999990.653204000000000010.651553999999999970.654379000000000040.652166000000000020.6523890.652445999999999970.653175999999999980.651386000000000020.651321000000000040.649383999999999960.6492040.650818000000000010.648526999999999960.649217999999999960.646105999999999960.647120999999999950.643121000000000050.642440000000000010.6448180.644530000000000050.641760.643124000000000030.640317000000000030.641109999999999960.639997000000000040.642955000000000050.639491999999999950.639349999999999970.639368000000000050.642256999999999970.641564000000000020.637778999999999980.636665999999999950.638843000000000050.636413000000000010.635587000000000010.637256999999999960.634897999999999960.635118999999999990.636568000000000020.634881000000000030.632546000000000050.634657000000000030.634187999999999970.634541000000000020.631920000000000040.630928000000000040.629923000000000010.629627000000000050.630577999999999970.629361999999999980.626554000000000060.628858000000000030.626232000000000010.629252999999999950.627394000000000010.629597999999999990.6301310.627118999999999980.6261660.629271000000000020.629692999999999950.629402000000000020.625020000000000020.625295999999999960.624009000000000040.6274670.623230999999999980.623939000000000020.623898999999999980.623650999999999960.620993000000000020.624669999999999950.618866000000000030.6176410.619454999999999980.621789999999999950.620812000000000030.617468000000000020.617534000000000030.6172010.616277999999999990.607635999999999950.606905999999999950.606049999999999980.603326999999999950.602682000000000050.601434000000000020.5980550.596882000000000020.593148000000000010.591567999999999980.584269999999999960.583115999999999970.5847040.588864000000000050.583806000000000050.583899999999999970.582431999999999950.581879999999999950.578389999999999960.583825999999999960.584100999999999980.581327000000000040.581158999999999980.576447999999999960.573559000000000040.570795000000000050.572281999999999960.570778000000000010.564817000000000010.567209999999999990.565586000000000030.564914999999999940.565262000000000040.564170999999999980.556061999999999950.551096000000000030.558814000000000030.5514270.546923999999999970.553835999999999990.544205000000000050.550601000000000010.544231000000000020.528208000000000010.528799000000000020.528614999999999950.525105999999999960.522641000000000020.508091999999999990.502692000000000030.486565000000000030.3269880.327077000000000010.326114999999999990.324460000000000030.322257999999999990.321892000000000010.319697999999999980.319525999999999980.317790999999999990.315433000000000020.262849999999999970.262392999999999990.262417999999999980.259788000000000020.2596330.254599000000000020.2544670.253803999999999970.253601999999999990.250512999999999990.233523000000000010.154989999999999990.154824999999999990.154790000000000010.1475050.145377000000000010.145467999999999996.4999999999999997E-36.4999999999999997E-36.5000000000000006E-36.4999999999999988E-36.3999999999999994E-36.4999999999999988E-36.4000000000000029E-36.4999999999999988E-36.4000000000000029E-36.4999999999999919E-36.4000000000000029E-36.4000000000000029E-36.4000000000000029E-36.4000000000000029E-36.399999999999989E-36.4000000000000029E-36.4000000000000029E-36.4000000000000029E-36.4000000000000029E-36.4000000000000029E-36.399999999999989E-36.399999999999989E-36.4000000000000168E-36.3E-36.399999999999989E-36.4000000000000168E-36.399999999999989E-36.3E-36.399999999999989E-36.4000000000000168E-36.3E-36.399999999999989E-36.3E-36.4000000000000168E-36.3E-36.3E-36.3E-36.399999999999989E-36.2000000000000111E-36.2999999999999723E-36.3000000000000278E-36.2999999999999723E-36.3000000000000278E-36.1999999999999833E-36.3000000000000278E-36.1999999999999833E-36.2999999999999723E-36.3000000000000278E-36.1999999999999833E-36.3000000000000278E-36.1999999999999833E-36.3000000000000278E-36.1999999999999833E-36.2999999999999723E-36.2000000000000388E-36.1999999999999833E-36.1999999999999833E-36.2000000000000388E-36.2999999999999723E-36.1999999999999833E-36.2000000000000388E-36.1999999999999833E-36.1999999999999833E-36.0999999999999943E-36.2000000000000388E-36.1999999999999833E-36.1999999999999833E-36.2000000000000388E-36.1999999999999833E-36.0999999999999943E-36.1999999999999833E-36.0999999999999943E-36.2000000000000388E-36.0999999999999943E-36.0999999999999943E-36.0999999999999943E-36.1999999999999833E-36.0999999999999943E-36.0999999999999943E-36.2000000000000388E-36.0999999999999943E-36.0999999999999943E-36.1999999999999833E-36.0999999999999943E-36.1999999999999833E-36.0999999999999943E-36.0999999999999943E-36.0999999999999943E-36.1000000000001053E-36.0999999999999943E-36.0999999999999943E-36.0999999999999943E-36.0000000000000053E-36.0999999999999943E-36.0999999999999943E-36.0000000000000053E-36.0999999999999943E-36.0000000000000053E-36.0999999999999943E-36.0999999999999943E-36.0000000000000053E-36.0000000000000053E-36.0000000000000053E-36.0999999999999943E-35.9000000000000163E-35.8999999999999053E-35.9000000000000163E-35.9000000000000163E-35.9000000000000163E-35.9000000000000163E-35.8000000000000274E-35.7999999999999163E-35.8000000000000274E-35.8000000000000274E-35.6999999999999273E-35.7000000000000384E-35.7000000000000384E-35.8000000000000274E-35.6999999999999273E-35.7000000000000384E-35.5999999999999384E-35.7000000000000384E-35.7000000000000384E-35.6999999999999273E-35.7000000000000384E-35.7000000000000384E-35.5999999999999384E-35.7000000000000384E-35.5999999999999384E-35.5000000000000604E-35.5999999999999384E-35.6000000000000494E-35.4999999999999494E-35.5000000000000604E-35.5999999999999384E-35.5000000000000604E-35.4999999999999494E-35.5000000000000604E-35.3999999999999604E-35.4000000000000714E-35.4999999999999494E-35.2999999999999714E-35.4000000000000714E-35.3999999999999604E-35.2999999999999714E-35.4000000000000714E-35.2999999999999714E-35.1999999999999824E-35.0999999999999934E-35.1999999999999824E-35.0999999999999934E-35.0999999999999934E-35.0000000000000044E-34.9000000000000155E-34.7000000000000375E-33.1999999999999806E-33.1999999999999806E-33.1999999999999806E-33.2000000000000917E-33.0999999999999917E-33.1999999999999806E-33.0999999999999917E-33.0999999999999917E-33.0999999999999917E-33.0999999999999917E-32.5000000000000577E-32.5999999999999357E-32.6000000000000467E-32.4999999999999467E-32.5000000000000577E-32.4999999999999467E-32.5000000000000577E-32.4999999999999467E-32.5000000000000577E-32.3999999999999577E-32.2999999999999687E-31.5000000000000568E-31.4999999999999458E-31.5000000000000568E-31.4999999999999458E-31.4000000000000679E-31.3999999999999568E-3

CGDT-TS

Probablity

P-2P-WMDSC-2P-WMDSP-3P-WMDSC-3P-WMDSP-3P-MODELLERC-3P-MODELLERP-2P-WMDS-MODELLERC-2P-WMDS-MODELLER41.733333333333334111.50312535.881250000000001109.715625212.80520833333333276.32916666666665368.359375428.05416666666667

Algorithms

Minites

Evaluation

P-2P-WMDSC-2P-WMDSP-3P-WMDSC-3P-WMDSP-3P-MODELLERC-3P-MODELLERP-2P-WMDS-MODELLERC-2P-WMDS-MODELLER1.65260416666666678.51385416666666651.34135416666666668.5667708333333331.05583333333333349.82249999999999981.523854166666666710.991562499999999Selection

P-2P-WMDSC-2P-WMDSP-3P-WMDSC-3P-WMDSP-3P-MODELLERC-3P-MODELLERP-2P-WMDS-MODELLERC-2P-WMDS-MODELLER0002.0833333333333335E-45.2083333333333333E-44.1666666666666669E-44.1666666666666669E-46.2500000000000001E-4Crossover

P-2P-WMDSC-2P-WMDSP-3P-WMDSC-3P-WMDSP-3P-MODELLERC-3P-MODELLERP-2P-WMDS-MODELLERC-2P-WMDS-MODELLER2.52020833333333362.636252.24593749999999972.404062500000000220.22395833333333217.80947916666666635.31083333333333531.813020833333333

Algorithms

Minutes

P-2P-WMDS EvaluationSelectionBuild Matrix Combination WMDS xyz2pdb1.652604166666666700.650312499999999960.223645833333333321.08468750.56156250000000008C-2P-WMDS EvaluationSelectionBuild Matrix Combination WMDS xyz2pdb8.513854166666666500.468854166666666680.2006251.54604166666666680.42072916666666665P-3P-WMDS EvaluationSelectionBuild Matrix Combination WMDS xyz2pdb1.341354166666666600.541666666666666630.207291666666666681.07427083333333330.42270833333333335C-3P-WMDS EvaluationSelectionBuild Matrix Combination WMDS xyz2pdb8.5667708333333332.0833333333333335E-40.620312500000000040.245625000000000011.19260416666666670.34552083333333333

Steps

Minutes

The GDT-TS of best model using ProQ2 evaluation

Initial BestT0678T0668T0675T0698T0673T0669T0696T0680T0654T0648T0657T0662T0700T0659T0709T06650.417200000000000020.429499999999999990.609600000000000030.647100000000000010.65320.683000000000000050.707500000000000020.763000000000000010.768199999999999990.822699999999999990.870299999999999960.888199999999999990.964300000000000050.969600000000000020.989600000000000040.99480000000000002P-2P-WMDST0678T0668T0675T0698T0673T0669T0696T0680T0654T0648T0657T0662T0700T0659T0709T06650.431800000000000020.442300000000000030.596500000000000030.670200000000000020.65320.70360.704999999999999960.833300000000000040.768199999999999990.834300000000000040.874099999999999990.917799999999999950.964300000000000050.972999999999999980.989600000000000040.99480000000000002P-3P-WMDST0678T0668T0675T0698T0673T0669T0696T0680T0654T0648T0657T0662T0700T0659T0709T06650.417200000000000020.429499999999999990.609600000000000030.647100000000000010.43950.683000000000000050.707500000000000020.763000000000000010.768199999999999990.822699999999999990.870299999999999960.888199999999999990.964300000000000050.969600000000000020.989600000000000040.99480000000000002P-3P-MODELLERT0678T0668T0675T0698T0673T0669T0696T0680T0654T0648T0657T0662T0700T0659T0709T06650.422099999999999980.429499999999999990.578899999999999970.672300000000000010.65320.716500000000000030.747500000000000050.763000000000000010.764599999999999950.831400000000000030.875900000000000010.934200000000000030.871399999999999950.9696000000000000210.99480000000000002P-2P-WMDS-MODELLER0.417200000000000020.442300000000000030.596500000000000030.661800000000000050.65320.724199999999999950.704999999999999960.763000000000000010.751800000000000020.842999999999999970.875900000000000010.901299999999999990.842899999999999980.969600000000000020.979199999999999960.99480000000000002

Targets

GDT-TS

The RMSD of best model using ProQ2 evaluation

Initial BestT0668T0678T0673T0675T0698T0648T0696T0680T0669T0654T0657T0662T0659T0700T0709T06657.5326.285.34900000000000024.40000000000000043.91599999999999993.49400000000000023.45299999999999983.2982.90799999999999992.54199999999999982.2541.48100000000000010.860999999999999990.802000000000000050.685000000000000050.626P-2P-WMDST0668T0678T0673T0675T0698T0648T0696T0680T0669T0654T0657T0662T0659T0700T0709T06656.40399999999999996.13900000000000025.34900000000000024.50900000000000033.98200000000000023.49400000000000023.45299999999999982.50199999999999982.87599999999999992.54199999999999982.2181.43300000000000010.80.802000000000000050.693999999999999950.626P-3P-WMDST0668T0678T0673T0675T0698T0648T0696T0680T0669T0654T0657T0662T0659T0700T0709T06657.5326.2811.064.40000000000000043.91599999999999993.49400000000000023.45299999999999983.2982.90799999999999992.54199999999999982.2541.48100000000000010.860999999999999990.802000000000000050.685000000000000050.626P-3P-MODELLERT0668T0678T0673T0675T0698T0648T0696T0680T0669T0654T0657T0662T0659T0700T0709T06657.51299999999999997.05400000000000035.30199999999999965.0484.01900000000000013.46499999999999993.89199999999999993.9442.853.06400000000000012.1941.31600000000000010.797000000000000041.31299999999999990.510.67100000000000004P-2P-WMDS-MODELLER6.27700000000000016.285.34900000000000024.50900000000000033.98200000000000022.93599999999999994.12699999999999983.2982.90799999999999992.81700000000000022.22900000000000011.45300000000000010.817999999999999951.6410.768000000000000020.626

Targets

RMSD

The average GDT-TS of best 10 models using ProQ2 evaluation

Initial BestT0668T0678T0673T0675T0698T0669T0696T0680T0654T0648T0657T0662T0700T0659T0709T06650.374690000000000020.391230000000000020.490719999