analysis of haloacid dehalogenase superfamily members and functional assignment …... ·...

34
Analysis of Haloacid Dehalogenase superfamily members and functional assignment of Structural Genomics proteins by Mong Mary Touch B.S. in Chemistry, University of Massachusetts Lowell A thesis submitted to The Faculty of the College of Science of Northeastern University in partial fulfillment of the requirements for the degree of Master of Science December 12, 2014 Thesis directed by Mary Jo Ondrechen Professor of Chemistry and Chemical Biology

Upload: others

Post on 11-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

Analysis of Haloacid Dehalogenase superfamily members and functional assignment of

Structural Genomics proteins

by Mong Mary Touch

B.S. in Chemistry, University of Massachusetts Lowell

A thesis submitted to

The Faculty of

the College of Science of

Northeastern University

in partial fulfillment of the requirements

for the degree of Master of Science

December 12, 2014

Thesis directed by Mary Jo Ondrechen

Professor of Chemistry and Chemical Biology

Page 2: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

ii

Acknowledgments

I would like to thank my advisor Dr. Mary Jo Ondrechen who allowed me to join

her laboratory and work on this project, and provides valuable guidance. Special thanks

to my mentee and friend, Eve Mozur, who had worked really hard on this project with

me. I also would like to thank my committees, Dr. Penny Beuning and Dr. Carla Mattos,

for their assistance and their time in reviewing my project thesis. I want to thank previous

and current ORG members: Dr. Joslynn Lee, Dr. Ramya Parasuram, Lisa Ngu, Caitlyn

Mills, Timothy Coulther, Zhen Liu, and Jenifer Winters for their support and discussions.

My project was financially supported by National Science Foundation under grant CHE-

1305655.

Page 3: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

iii

Abstract of Thesis

Dehalogenases are enzymes that can degrade certain types of organic pollutants,

particularly halogenated hydrocarbons, into non-toxic compounds. In this research, the

members of the haloacid dehalogenase (HAD) superfamily are analyzed to predict their

biochemical function, with the ultimate goal of finding enzymes for possible application

to the bioremediation of environmental contaminants. The HAD superfamily consists of

mainly phosphatases, dehalogenases, and a large number of protein structures of

unknown or uncertain function from Structural Genomics (SG). To study the HAD

superfamily, the computational methods Partial Order Optimum Likelihood (POOL) and

Structurally Aligned Local Sites of Activity (SALSA) are utilized. From this study, the

SG protein RSc1362 from Ralstonia solanacearum (PDB ID 3UMB) is predicted to

function as an L-2 haloacid dehalogenase while HAD/COF-like hydrolase from

Plasmodium vivax (PDB ID 2B30), a hypothetical protein from Geobacillus kaustophilus

(PDB ID 2PQ0), putative phosphate from Eubacterium Rectale (PDB ID 3DAO), and

haloacid dehalogenase-like hydrolase from Bacteroides thetaiotaomicron (PDB ID

3NIW) are predicted to function as sugar phosphatases.

Page 4: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

iv

Table of Contents

Acknowledgement ii

Abstract of Thesis iii

Table of Contents iv

List of Figures v

List of Tables vi

Introduction 1

Methods 7

Results 10

Conclusions 24

References 25

Page 5: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

v

List of Figures

Figure 1 Capless HAD enzyme, deoxy-D-mannose-octulosonate 2

8-phosphate phosphatase

Figure 2 Catalytic mechanism of phosphatases and dehalogenases 3

of the HAD superfamily

Figure 3 Structural alignment and functional residue alignment of 18

L-2 haloacid dehalogenases

Figure 4 Structural alignment and functional residue alignment of 20

sugar phosphatases

Figure 5 Structural alignment and functional residue alignment of 21

C-terminal domain phosphatases

Page 6: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

vi

List of Tables

Table 1 Molecule name and source organism of proteins of 5

known function in HAD superfamily

Table 2 Names and source organism of the studied SG proteins 6

Table 3 Catalytic residues of representative proteins of the 8

HAD superfamily obtained from the literature

Table 4 Consensus signatures of proteins of three subgroups 12

of HAD superfamily

Table 5 Alignment of the predicted residues for the SG proteins 13

with the consensus signatures for three HAD subgroups

Table 6 SALSA table 14

Table 7 Functional residue alignments of high scoring SG proteins 15

to the consensus signatures

Page 7: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

1

Analysis of Haloacid Dehalogenase superfamily members and functional

assignment of Structural Genomics proteins

I. Introduction

Halogenated hydrocarbons are widely used in industry, agriculture and in

household items1. These halogenated hydrocarbons can be found in silicones, pesticides,

disinfectants, air fresheners, and rug cleaners. The halogenated hydrocarbons such as

trichloroethane (TCA), trichloroethene (TCE), and perchloroethylene (PCE) are known to

be the most common environmental pollutants found in soil and ground water in the

United States2. According to the United States Environmental Protection Agency, TCE, a

volatile and colorless organic compound, is a priority contaminant due to its wide usage

and potential carcinogenicity3. The contamination caused by halogenated hydrocarbons

can be detoxified by dehalogenation. Reductive dehalogenation can degrade TCE to yield

ethane and hydrochloric acid4. Due to the importance of dehalogenation in

bioremediation, the haloacid dehalogenase (HAD) superfamily is studied.

The members of the HAD superfamily exist in all three superkingdoms of life5.

The length of these enzymes is approximately 200 to 250 amino acids. They fold into a

Rossmanoid fold, a three layered α/β sandwich with a repeating α/β unit5 shown in Figure

1. The alpha/beta sequence of the core domain is highly conserved throughout the

superfamily. Some HAD enzymes contain a cap domain, a sequence insertion into the

core domain which functions as a dynamic lid that determines the accessibility of the

active site6. The modified core catalytic domain creates loop regions that consist of four

conserved motifs. Loop I has the conserved nucleophilic Asp and loop II has the

conserved phosphate-binding residue, either Ser or Thr7. Loop III has a Lys or an Arg

that interacts with the phosphoryl group of the substrate and loop IV binds a magnesium

Page 8: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

2

cofactor. Some proteins of the HAD superfamily are capless and they catalyze reactions

with large substrates such as proteins and DNA. An example of a capless HAD enzyme,

deoxy-D-mannose-octulosonate 8-phosphate phosphatase (PDB ID 1K1E)8, is shown in

Figure 1.

Figure 1: A capless HAD enzyme, deoxy-D-mannose-octulosonate 8-phosphate

phosphatase, depicting the Rossmanoid fold (PDB ID 1K1E)8. The image was built using

YASARA software9.

The enzymes that belong to the HAD superfamily include the P-type ATPase,

phosphatases, epoxide hydrolyses and L-2 haloacid dehalogenases; the latter catalyze the

dehalogenation reaction on a wide variety of substrates10

. Most of the HAD enzymes are

phosphatases. Phosphatases catalyze the hydrolysis of a phosphoryl group of the

substrates in two steps as shown in Figure 2. First, a conserved nucleophilic Asp of the

enzyme attacks the electrophilic center phosphorus atom5. Then, the phosphoryl-

intermediate is formed which leads to the second step that requires a water molecule to

hydrolyze the intermediate to regenerate the enzyme5. The nucleophilic attack by Asp is

driven by the coordination of the metal ion for the catalysis. The positive charge on the

metal ion neutralizes the negative charges on the Asp and the phosphate group. It also

assists in stabilizing the enzyme structure. Similarly, the dehalogenases require the Asp

Page 9: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

3

nucleophilic attack to form the aspartyl-intermediate that is hydrolyzed by a water

molecule to remove the chloride ion, as also shown in Figure 2.

Phosphatases

Dehalogenases

Figure 2: Catalytic mechanism of phosphatases and dehalogenases of the HAD

superfamily5.

Computational methods for analyzing the HAD superfamily

HAD is a large superfamily that consists of many enzymes, some of which have

unknown or putative biochemical functions. These proteins of unknown or putative

function are primarily Structural Genomics (SG) proteins, i.e. protein structures

determined by the Protein Structure Initiative (PSI) or other high-throughput structure

determination projects. The functions of SG proteins are often assigned based on

sequence or structure similarity to one of the proteins of known function belonging to the

HAD superfamily. The functional assignments of the SG proteins typically are obtained

using sequence based methods or sequence comparison methods such as PSI-BLAST11

.

The study of SG proteins can help to increase understanding of the relationship

between the structure and function of proteins. However, the use of the three-dimensional

structures for the determination of the biological function of proteins has proved to be

much more difficult than was originally envisioned when the PSI was first proposed12

. To

Page 10: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

4

determine the function of SG proteins, the computational methods Partial Order Optimum

Likelihood (POOL) and Structurally Aligned Local Sites of Activity (SALSA) are

utilized. POOL is used to predict amino acid residues in a protein structure that

participate in the biochemical function and SALSA is used to assign function according

to the local spatial arrangement of predicted residues at the active site. The development,

application, and verification of computational methods that can predict protein function

reliably from the three-dimensional structure will add tremendous value to Structural

Genomics data. This thesis represents an important step toward that goal.

POOL is a monotonicity-constrained maximum likelihood approach that uses the

three-dimensional structure of proteins to predict important residues involved in ligand

recognition or catalysis13

. POOL is a machine learning method which uses input features

such as metrics from theoretical microscopic anomalous titration curve shapes

(THEMATICS)14

, ConCavity pocket features (ConCavity)15

, and INformation-theoretic

TREe traversal for Protein functional site IDentification (INTREPID)16

to make the

predictions. The three methodologies used to generate POOL input features,

THEMATICS, ConCavity, and INTREPID, are now reviewed briefly.

THEMATICS is a computational method that predicts ionizable residues (Arg,

Lys, Asp, Glu, His, Cys and Tyr, plus the N- and C- termini) associated with the

Brønsted acid-base chemistry in the active site of enzymes14

. Using only the three-

dimensional structure of a protein, the ionizable residues in the active site can be

identified by the different shapes in their theoretical titration curves that deviate from the

usual Henderson-Hasselbalch curves14

. THEMATICS obtains the theoretical titration

curves from an approximate solution of the Poisson-Boltzmann equations17

and then from

Page 11: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

5

a Monte Carlo sampling by HYBRID18

. Statistical and machine learning methods are

then used to calculate which residues are functionally significant in or near the active site

of an enzyme.

The structure-only version of ConCavity is an algorithm that uses the surface

topology of the protein structure to evaluate residues based on their location in protein

surface cavities15

. ConCavity assigns scores to residues in the protein based on their

likelihood of ligand binding to predict the location of the ligand-binding atoms in space15

.

INTREPID is a method that uses the sequence information, the phylogenetic tree, and the

Jensen-Shannon (JS) divergence19

to predict the conserved catalytic positions of the

query proteins16

. The positional conservation scores obtained from the calculation of JS

divergence are adjusted to consider the scores of other positions16

. Thus, the combination

of THEMATICS, ConCavity, and INTREPID enhances the performance of POOL in the

prediction of important residues in catalysis.

Table 1: Molecule name and source organism of proteins of known function in HAD

superfamily obtained from RCSB protein data bank20

.

Subgroup PDB ID Molecule name Organism

L-2 Haloacid

Dehalogenase

1JUD L-DEX YL Pseudomonas sp. YL

1AQ6 DhlB Xanthobacter autrophicus

2NO4 DehIVa Burkholderia cepacia

Sugar Phosphatase

1YMQ BT4131 Bacteroides thetaiotaomicron

1U02 T6PP Thermoplasma acidophilum

1TJ3 SPP Synechocystis sp. PCC 6803

C-terminal Domain

Phosphatase

3EF0 Fcp1 Schizosaccharomyces pombe

2GHQ Scp1 Homo sapiens

Page 12: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

6

Table 2: Names and source organism of the SG proteins that are studied here.

PDB

ID Molecule name Organism

1PW5 Putative nagD protein Thermotoga maritima

1QYI Q8NW41 Staphylococcus aureus subsp.

aureus MW2

1YV9 Hypothetical protein Enterococcus faecalis

1ZJJ Hypothetical protein PH1952 Pyrococcus horikoshii

2B30 HAD/COF-like hydrolase Plasmodium vivax

2HI0 Putative phosphoglycolate phosphatase Gallus gallus

2HOQ Probable haloacid dehalogenase Pyrococcus horikoshii

2HSZ Novel predicted phosphatase Haemophilus somnus

2PQ0 Hypothetical protein Geobacillus kaustophilus

2PR7 Haloacid dehalogenase/epoxide hydrolase

family Corynebacterium glutamicum

2YBD Hydrolase, haloacid dehalogenase-like family Pseudomonas fluorescens Pf-5

3DAO Putative phosphate Eubacterium rectale

3EPR Hydrolase, haloacid dehalogenase-like family Streptococcus agalactiae serogroup

V

3FVV Uncharacterized protein Bordetella pertussis

3M9L Hydrolase, haloacid dehalogenase-like family Pseudomonas protegens Pf-5

3NIW Haloacid dehalogenase-like hydrolase Bacteroides thetaiotaomicron

3QNM Haloacid dehalogenase-like hydrolase Bacteroides thetaiotaomicron

3R09 Hydrolase, haloacid dehalogenase-like family Pseudomonas fluorescens Pf-5

3UMB RSc1362 Ralstonia solanacearum

4EEK Beta-phosphoglucomutase-related protein Deinococcus radiodurans

4GXT A conserved functionally unknown protein Anaerococcus prevotii

For each protein structure, the predicted set of residues involved in catalysis

consists of a spatially localized arrangement of specific types of amino acids. The

challenge of matching residues of the SG proteins to those of the proteins of known

function can be overcome using a matching and scoring method, SALSA. First SALSA

establishes consensus signatures, local spatial arrangements of catalytically important

residues, based on POOL predictions for proteins of common known function. SALSA

then predicts the biochemical function of proteins of unknown function by matching

Page 13: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

7

residues at the predicted catalytic site with the consensus signatures for the known

functional types. Then SALSA calculates a score that measures the quality of the local

structural match21

. From the scores, the annotated function of SG proteins may be

confirmed or challenged; in some cases where the original functional assignment is found

to be incorrect, a more likely function may be assigned. The SG proteins that have

functional assignments made by POOL and SALSA will need verification from

experiments to confirm the annotation. In this study, the focus is on HAD superfamily

proteins of known function and selected SG proteins; the HAD proteins of known

function used to obtain the consensus signatures in this study are listed in Table 1. The

SG proteins evaluated here are listed in Table 2.

II. Methods

2.1. Analysis of the proteins of known function of HAD superfamily

The members of the HAD superfamily are classified into subgroups according to

their biochemical reactions and functions. A few representative proteins of known

function with sequence diversity within each subgroup are chosen for this study. Due to

the time constraint, only three subgroups of the HAD superfamily were analyzed. These

subgroups were the proteins of the L-2 haloacid dehalogenase, sugar phosphatase, and C-

terminal domain phosphatase. For each of these subgroups, representative proteins were

selected for the study. The selection was based on their sequence identity in which low

sequence similarities between the proteins within a subgroup were preferred to avoid any

bias in the determination of spatially overlapped residues. As shown in Table 1, eight

total proteins of the HAD superfamily were chosen. Three of them were the L-2 haloacid

dehalogenases, three were sugar phosphatases, and two were C-terminal domain

Page 14: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

8

phosphatases. Again, the reaction mechanisms of these dehalogenases and phosphatases

were shown in Figure 2. The previously reported functionally important residues for each

protein are obtained from literature reports and the PDBsum database22

listed in Table 3.

Table 3: The catalytic residues of representative proteins of the HAD superfamily

obtain from the literature.

PDB

ID Catalytic/Important residues

1JUD D10, Y12, T14, R41, S118, K151, Y157, S175, N177, D180

1AQ6 D8, Y10, T12, R39, L113, S114, N115, G116, K147, Y153, S171, N173, D176

2NO4 D11, T15, R42, S119, K152, Y158, S176, N178, D181

1YMQ D8, D10, T43, K188, D211, N214, D215

1U02 D7, D9, T45, R47, K161, D179, D180, D183

1TJ3 D9, D11, T41, G42, K163, N189

3EF0 D170, L171, D172, T174, K280, D297, D298

2GHQ D96, D98, Y158, R178, K190, D206

Often, not all of the important residues located in or near the binding pocket have

been tested and reported in the literature. An alternative to find most or all of the

important residues in enzymes is to use the POOL method that predicts the important

residues involved in ligand recognition and catalysis. Using the set of protein structures

of known function, the Consensus Signatures (CS) are identified for each HAD subgroup

in order to provide a means to recognize that subgroup. CS is defined by the spatial

alignment of POOL-predicted amino acid residues of the same type for all or most of the

proteins of known function within that functional subgroup. The alignments of proteins

within each subgroup are made using the PDBefold database23

and the Chimera

software24

.

Important residues in the active site of proteins are obtained using the POOL

method. Minimally, the three-dimensional structure of the protein is required for the

Page 15: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

9

POOL method; if a sufficient number of homologues exists, then INTREPID can be used

to improve accuracy of the functional residue predictions. The three input types,

THEMATICS, ConCavity, and INTREPID, are used for the POOL prediction. For

present purposes, the top 10% of the residues in the POOL ranking is considered

significant for participation in catalysis. The predicted set of important residues is used to

compare to the consensus signatures obtained for the proteins of known function. The

output from POOL is used for the SALSA scoring. SALSA scores proteins using a

scoring matrix on the aligned residues in the local spatial region of the predicted active

site. For example, proteins within a subgroup with aligned active site residues similar to

each other or to the consensus signature will get a high score. In contrast, proteins that

belong to different subgroups will have a low score because the aligned active site

residues do not match well.

2.2. Analysis of Structural Genomics proteins

The Structural Genomics proteins in the superfamily can be found by submitting

all of the representative protein structures to a structure comparison server, such as the

DALI server25

. The DALI server outputs a list of protein structures, along with their PDB

ID, structure comparison score, and percent sequence identity relative to the input

protein. The SG proteins with low percent sequence identities relative to the

representative proteins are the most interesting in this study, because two proteins with

high percentage identity are more likely to have the same function. POOL was run for

each SG protein to obtain important residues in the active site. The top 10% of the ranked

residues was also used. All the SG proteins are aligned with the representative proteins

using the PDBefold database. Each SG protein along with each representative protein

Page 16: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

10

was given a score and this generates a scoring table. The scores were calculated using the

BLOSUM62 matrix, a matrix that scores aligned residues in pairwise fashion26

. The

number 62 in the name indicates that protein sequences that are more than 62% identical

were clustered together to generate the matrix, which is based on amino acid substitution

probability. The SG proteins with a score higher than the standard (>0.4) are predicted to

have the same function as the representative proteins.

III. Results

The members of the HAD superfamily were obtained from a search through the

SCOP27

and SFLD28

databases. A few representative enzymes were chosen for each

subgroup. These representative proteins were analyzed using POOL to obtain a ranked

list of residues, the top-ranked residues being the most likely to participate in catalysis.

The POOL-predicted residues were used to generate the consensus signatures for the

known functional subgroups. These consensus signatures for the three subgroups are

shown in Table 4. For each protein structure, the subunit used in the alignment is

indicated, along with the PDB ID, in the first column. Each row in Table 4 represents a

protein structure, grouped according to biochemical function. Each column represents an

aligned spatial position. Residues previously reported to be important are shown in red.

POOL-predicted residues are shown in upper case.

The Consensus Signatures of three subgroups of HAD superfamily shown in

Table 4 were obtained from literature information and from the alignment of the POOL-

predicted residues of the proteins of known function within the subgroup. From this

alignment of the proteins of known function, a total of 23 different spatial positions were

found to be important for one or more of the three functional types. The alignments of the

Page 17: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

11

21 SG proteins to the three subgroups were generated based upon the alignments of all

SG proteins to each subgroup individually and as a whole. The resulting local alignments

are shown in Table 5.

Page 18: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

12

Table 4: The consensus signatures of proteins of three subgroups of HAD superfamily. POOL predicted residues are shown in

upper case. Catalytic residues previously reported in the literature are shown in red. Rows represent individual protein structures, with

proteins of common function grouped together. Columns represent aligned spatial positions.

Position

PDB ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

1JUD(A) D10 Y12 G13 T14 R41 Y91 L117 S118 N119 G120 K151 y157 s175 N177 D180

1AQ6(A) D8 Y10 G11 T12 R39 Y89 l113 S114 N115 g116 K147 Y153 s171 N173 D176

2NO4(A) D11 Y13 G14 t15 R42 Y92 l118 s119 n120 g121 K152 Y158 s176 N178 D181

1YMQ(A) D8 D10 G11 T12 T43 G44 R45 K188 D211 G212 N214 D215

1U02(A) D7 D9 G10 T11 T45 G46 R47 K161 D179 D180 T182 D183

1TJ3(A) D9 D11 n12 t13 T41 G42 r43 K163 D186 S187 N189 D190

3EF0(A) D170 l171 D172 T174 T243 Y249 r271 K280 D297 D298

2GHQ(B) D96 L97 D98 t100 t152 Y158 R178 K190 D206 N207

Page 19: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

13

Table 5: The alignment of the predicted residues for the SG proteins with the consensus signatures for three HAD subgroups.

POOL-predicted residues are shown in uppercase. Previously reported catalytic residues are shown in red.

Position

PDB ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

1JUD(A) D10 Y12 G13 T14 R41 Y91 L117 S118 N119 G120 K151 y157 s175 N177 D180

1AQ6(A) D8 Y10 G11 T12 R39 Y89 l113 S114 N115 g116 K147 Y153 s171 N173 D176

2NO4(A) D11 Y13 G14 t15 R42 Y92 l118 s119 n120 g121 K152 Y158 s176 N178 D181

1YMQ(A) D8 D10 G11 T12 T43 G44 R45 K188 D211 G212 N214 D215

1U02(A) D7 D9 G10 T11 T45 G46 R47 K161 D179 D180 T182 D183

1TJ3(A) D9 D11 n12 t13 T41 G42 r43 K163 D186 S187 N189 D190

3EF0(A) D170 l171 D172 T174 T243 Y249 r271 K280 D297 D298

2GHQ(B) D96 L97 D98 t100 t152 Y158 R178 K190 D206 N207

1PW5(A) D11 M12 D13 G14 T15 f43 T44 N45 N46 s47 d53 m81 n185 p186 v189 G207 D208 R209 D213

1QYI(A) D7 V8 D9 G10 V11 l38 y209 a237 T238 G239 R240 p241 E244 a266 y276 n286 p287 Y290 G322 D323 S324 D327

1YV9(A) D11 L12 D13 G14 T15 v43 T44 N46 t47 t53 m82 k185 a186 m189 G207 D208 N209 D213

1ZJJ(A) D7 M8 D9 G10 V11 l39 T40 N42 s43 m49 m77 n189 e190 y193 G209 D210 R211 D215

2B30(C) D33 F34 D35 G36 T37 C67 T68 G69 R70 K225 G247 D248 A249 N251 D252

2HI0(A) D9 m10 D11 G12 T13 v40 q105 v131 S132 N133 K134 p135 a138 a166 p167 t170 G188 D189 S190 D193

2HOQ(A) D8 L9 D10 D11 T12 d39 f90 i116 T117 g119 n120 K123 K150 h152 p153 f156 G174 D175 R176 s179

2HSZ(A) D10 L11 D12 g13 t14 n41 c91 v117 T118 N119 K120 P121 H124 G144 h153 p154 f157 g175 D176 S177 D180

2PQ0(A) D9 D11 G12 T13 T43 G44 R45 K184 G206 D207 G208 N210 D211

2PR7(A) D7 Y8 a9 g10 v11 l39 S40 d42 p43 g44 g47 e75 e76 f79 D97 D98 s99 N102

2YBD(A) D12 M14 D15 G16 T17 l44 a67 L93 t94 r95 n96 l100 r122 l135 g154 d155 Y156 D159

3DAO(A) D8 I9 D10 g11 T12 C42 S43 G44 R45 Q46 K193 G215 D216 N217 N219 D220

3EPR(A) D11 L12 D13 G14 T15 V43 T44 N46 t47 s53 m81 n184 a185 m188 G206 D207 N208 D212

3FVV(A) D10 L11 D12 H13 T14 r42 m85 v114 T115 A116 T117 n118 v121 t137 S185 D186 S187 D190

3M9L(A) D12 m13 D14 G15 T16 l92 T93 R94 n95 l99 P128 G151 D152 y153 f155 D156

3NIW(A) D9 L10 D11 G12 T13 A42 S43 G44 R45 Y69 K196 G218 D219 g220 N222 D223

3QNM(A) D9 L10 D11 D12 T13 s40 p101 l126 S127 N128 g129 f130 l133 K160 r162 p163 f166 G184 D185 S186 A189

3R09(A) D12 M14 D15 G16 T17 l44 a67 L93 T94 r95 n96 l100 r122 l135 g154 d155 Y156 D159

3UMB(A) D10 a11 Y12 G13 T14 R41 Y95 l121 S122 N123 g124 m128 K155 a157 p158 Y161 s179 s180 N181 D184

4EEK(A) D12 L13 D14 G15 V16 e43 m84 g110 S111 n112 s113 r117 K146 h148 p149 Y152 E170 D171 S172 G175 g176

4GXT(A) D43 W44 d45 N46 T47 V240 S241 A242 S243 f244 i247 l269 v296 G316 D317 S318 G320 D321

Page 20: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

14

Table 6: SALSA table showing scores for the aligned functional residues of SG proteins to the representative proteins of HAD

subgroups. Several of the right-most columns of the table are missing due to space constraint, but are related by symmetry to the

bottom rows of the table. The high SALSA scores (>0.4) to the known functional subgroups, indicating a functional match, are

highlighted in green.

PDB ID 1JUD(A) 1AQ6(A) 2NO4(A) 1YMQ(A) 1U02(A) 1TJ3(A) 3EF0(A) 2GHQ(B) 1PW5(A) 1QYI(A) 1YV9(A) 1ZJJ(A) 2B30(C) 2HI0(A) 2HOQ(A) 2HSZ(A) 2PQ0(A) 2PR7(A) 2YBD(A) 3DAO(A) 3EPR(A) 3FVV(A) 3M9L(A) 3NIW(A) 3QNM(A) 3R09(A) 3UMB(A) 4EEK(A) 4GXT(A)

1JUD(A) 1 1 1 -0.16 -0.15 -0.21 -0.46 -0.52 -0.01 0.01 0.09 0.12 -0.20 0.02 0.06 0.01 -0.12 0.00 0.06 -0.02 0.09 0.00 -0.06 -0.16 0.03 0.06 0.74 0.08 -0.25

1AQ6(A) 1 1 1 -0.16 -0.15 -0.21 -0.46 -0.52 -0.01 0.01 0.09 0.12 -0.20 0.02 0.06 0.01 -0.12 0.00 0.06 -0.02 0.09 0.00 -0.06 -0.16 0.03 0.06 0.74 0.08 -0.25

2NO4(A) 1 1 1 -0.16 -0.15 -0.21 -0.46 -0.52 -0.01 0.01 0.09 0.12 -0.20 0.02 0.06 0.01 -0.12 0.00 0.06 -0.02 0.09 0.00 -0.06 -0.16 0.03 0.06 0.74 0.08 -0.25

1YMQ(A) -0.16 -0.16 -0.16 1 0.84 0.86 -0.12 -0.25 0.11 0.02 0.15 0.00 0.70 0.04 -0.16 0.02 0.94 -0.10 -0.08 0.64 0.08 -0.02 0.11 0.71 -0.22 -0.08 -0.28 -0.19 0.06

1U02(A) -0.15 -0.15 -0.15 0.84 1 0.79 -0.12 -0.26 0.11 0.02 0.16 0.00 0.61 0.04 -0.16 0.02 0.79 -0.10 -0.09 0.59 0.10 -0.02 0.12 0.56 -0.22 -0.09 -0.27 -0.22 0.04

1TJ3(A) -0.21 -0.21 -0.21 0.86 0.79 1 -0.12 -0.26 0.05 0.00 0.09 -0.06 0.65 0.01 -0.12 0.00 0.82 -0.12 -0.15 0.59 0.03 0.06 0.05 0.58 -0.15 -0.15 -0.35 -0.22 0.19

3EF0(A) -0.46 -0.46 -0.46 -0.1 -0.1 -0.1 1 0.85 -0.17 -0.33 -0.06 -0.11 -0.17 -0.28 -0.19 -0.11 -0.10 -0.25 0.01 -0.21 -0.06 -0.03 0.07 -0.08 -0.31 0.01 -0.50 -0.35 -0.10

2GHQ(B) -0.52 -0.52 -0.52 -0.3 -0.3 -0.3 0.847 1 -0.32 -0.49 -0.21 -0.26 -0.28 -0.43 -0.35 -0.26 -0.25 -0.40 -0.14 -0.36 -0.21 -0.18 -0.08 -0.24 -0.46 -0.14 -0.57 -0.50 -0.25

1PW5(A) -0.01 -0.01 -0.01 0.11 0.11 0.05 -0.17 -0.32 1 0.39 0.54 0.62 0.19 0.41 0.35 0.40 0.17 0.13 0.21 0.25 0.61 0.04 0.35 0.16 0.27 0.21 0.18 0.20 0.10

1QYI(A) 0.01 0.01 0.01 0.02 0.02 0.00 -0.33 -0.49 0.39 1 0.21 0.32 0.13 0.39 0.29 0.49 0.09 0.06 0.17 0.18 0.27 0.10 0.09 0.18 0.17 0.17 0.19 0.24 0.01

1YV9(A) 0.09 0.09 0.09 0.15 0.16 0.09 -0.06 -0.21 0.54 0.21 1 0.60 0.22 0.39 0.34 0.39 0.21 0.14 0.29 0.35 0.91 0.12 0.36 0.24 0.29 0.29 0.24 0.19 0.15

1ZJJ(A) 0.12 0.12 0.12 0.00 0.00 -0.06 -0.11 -0.26 0.62 0.32 0.60 1 0.13 0.29 0.36 0.32 0.09 0.27 0.30 0.19 0.64 0.02 0.40 0.09 0.26 0.30 0.30 0.30 0.07

2B30(C) -0.20 -0.20 -0.20 0.70 0.61 0.65 -0.17 -0.28 0.19 0.13 0.22 0.13 1 0.16 0.01 0.15 0.65 0.02 0.07 0.71 0.17 0.06 0.22 0.64 -0.04 0.07 -0.19 -0.12 0.19

2HI0(A) 0.02 0.02 0.02 0.04 0.04 0.01 -0.28 -0.43 0.41 0.39 0.39 0.29 0.16 1 0.22 0.60 0.12 0.15 0.29 0.25 0.37 0.18 0.29 0.19 0.35 0.29 0.28 0.23 0.16

2HOQ(A) 0.06 0.06 0.06 -0.16 -0.16 -0.12 -0.19 -0.35 0.35 0.29 0.34 0.36 0.01 0.22 1 0.41 -0.03 0.02 0.18 0.07 0.32 0.13 0.22 -0.01 0.45 0.18 0.24 0.38 -0.04

2HSZ(A) 0.01 0.01 0.01 0.02 0.02 0.00 -0.11 -0.26 0.40 0.49 0.39 0.32 0.15 0.60 0.41 1 0.09 0.12 0.24 0.16 0.33 0.21 0.20 0.16 0.31 0.24 0.17 0.28 0.09

2PQ0(A) -0.12 -0.12 -0.12 0.94 0.79 0.82 -0.10 -0.25 0.17 0.09 0.21 0.09 0.65 0.12 -0.03 0.09 1 -0.06 0.03 0.72 0.19 0.02 0.22 0.78 -0.09 0.03 -0.22 -0.16 0.17

2PR7(A) 0.00 0.00 0.00 -0.10 -0.10 -0.12 -0.25 -0.40 0.13 0.06 0.14 0.27 0.02 0.15 0.02 0.12 -0.06 1 -0.06 0.07 0.15 -0.09 0.03 -0.01 0.08 -0.06 0.12 0.10 0.00

2YBD(A) 0.06 0.06 0.06 -0.08 -0.09 -0.15 0.01 -0.14 0.21 0.17 0.29 0.30 0.07 0.29 0.18 0.24 0.03 -0.06 1 0.12 0.28 0.34 0.58 0.10 0.14 1.00 0.09 -0.03 0.12

3DAO(A) -0.02 -0.02 -0.02 0.64 0.59 0.59 -0.21 -0.36 0.25 0.18 0.35 0.19 0.71 0.25 0.07 0.16 0.72 0.07 0.12 1 0.30 0.12 0.28 0.70 0.06 0.12 -0.02 0.01 0.24

3EPR(A) 0.09 0.09 0.09 0.08 0.10 0.03 -0.06 -0.21 0.61 0.27 0.91 0.64 0.17 0.37 0.32 0.33 0.19 0.15 0.28 0.30 1 0.10 0.35 0.19 0.26 0.28 0.23 0.21 0.14

3FVV(A) 0.00 0.00 0.00 -0.02 -0.02 0.06 -0.03 -0.18 0.04 0.10 0.12 0.02 0.06 0.18 0.13 0.21 0.02 -0.09 0.34 0.12 0.10 1 0.22 0.12 0.06 0.35 0.02 -0.02 0.32

3M9L(A) -0.06 -0.06 -0.06 0.11 0.12 0.05 0.07 -0.08 0.35 0.09 0.36 0.40 0.22 0.29 0.22 0.20 0.22 0.03 0.58 0.28 0.35 0.22 1 0.19 0.16 0.59 0.13 0.07 0.28

3NIW(A) -0.16 -0.16 -0.16 0.71 0.56 0.58 -0.08 -0.24 0.16 0.18 0.24 0.09 0.64 0.19 -0.01 0.16 0.78 -0.01 0.10 0.70 0.19 0.12 0.19 1 0.0 0.1 -0.2 0.0 0.3

3QNM(A) 0.03 0.03 0.03 -0.22 -0.22 -0.15 -0.31 -0.46 0.27 0.17 0.29 0.26 -0.04 0.35 0.45 0.31 -0.09 0.08 0.14 0.06 0.26 0.06 0.16 0.01 1 0.1 0.3 0.3 0.1

3R09(A) 0.06 0.06 0.06 -0.08 -0.09 -0.15 0.01 -0.14 0.21 0.17 0.29 0.30 0.07 0.29 0.18 0.24 0.03 -0.06 1 0.12 0.28 0.35 0.59 0.10 0.12 1 0.09 -0.03 0.12

3UMB(A) 0.74 0.74 0.74 -0.28 -0.27 -0.35 -0.50 -0.57 0.18 0.19 0.24 0.30 -0.19 0.28 0.24 0.17 -0.22 0.12 0.09 -0.02 0.23 0.02 0.13 -0.15 0.26 0.09 1 0.3 -0.1

4EEK(A) 0.08 0.08 0.08 -0.19 -0.22 -0.22 -0.35 -0.50 0.20 0.24 0.19 0.30 -0.12 0.23 0.38 0.28 -0.16 0.10 -0.03 0.01 0.21 -0.02 0.07 -0.04 0.30 -0.03 0.27 1 -0.07

4GXT(A) -0.25 -0.25 -0.25 0.06 0.04 0.19 -0.10 -0.25 0.10 0.01 0.15 0.07 0.19 0.16 -0.04 0.09 0.17 0.00 0.12 0.24 0.14 0.32 0.28 0.27 0.13 0.12 -0.15 -0.07 1

Page 21: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

15

Table 7: The functional residue alignments of high scoring SG proteins to the consensus signatures.

Position

Subgroup PDB ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

L-2 Haloacid Dehalogenase

1JUD(A) D10 Y12 G13 T14 R41 Y91 L117 S118 N119 G120 K151 y157 s175 N177 D180

1AQ6(A) D8 Y10 G11 T12 R39 Y89 l113 S114 N115 g116 K147 Y153 s171 N173 D176

2NO4(A) D11 Y13 G14 t15 R42 Y92 l118 s119 n120 g121 K152 Y158 s176 N178 D181

SG 3UMB(A) D10 Y12 G13 T14 R41 Y95 l121 S122 N123 g124 K155 Y161 s179 N181 D184

Sugar Phosphatase

1YMQ(A) D8 D10 G11 T12 T43 G44 R45 K188 D211 G212 N214 D215

1U02(A) D7 D9 G10 T11 T45 G46 R47 K161 D179 D180 T182 D183

1TJ3(A) D9 D11 n12 t13 T41 G42 r43 K163 D186 S187 N189 D190

SG

2B30(C) D33 D35 G36 T37 T68 G69 R70 K225 D248 A249 N251 D252

2PQ0(A) D9 D11 G12 T13 T43 G44 R45 K184 D207 G208 N210 D211

3DAO(A) D8 D10 g11 T12 S43 G44 R45 K193 D216 N217 N219 D220

3NIW(A) D9 D11 G12 T13 S43 G44 R45 K196 D219 g220 N222 D223

C-terminal Domain Phosphatase

3EF0(A) D170 l171 D172 T174 T243 Y249 r271 K280 D297 D298

2GHQ(B) D96 L97 D98 t100 t152 Y158 R178 K190 D206 N207

Page 22: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

16

The second row of Table 5 indicates the 23 functionally important positions of the

spatially aligned residues and each of the rows 3-31 represents an individual protein

structure. Each of the 23 columns represents an aligned spatial position. The right-most

column in Table 5 specifies the PDB ID along with the subunit used in the alignment. For

example, L-2 haloacid dehalogenase from Pseudomonas sp. YL (PDB ID 1JUD)

indicates that chain A was used for aligning to the other proteins. Some of the protein

structures consist of one subunit; in this case the chain is indicated as A. For the

representative proteins, the chain A was used for L-2 haloacid dehalogenase from

Pseudomonas sp. YL (PDB ID 1JUD), L-2 haloacid dehalogenase from Xanthobacter

autrophicus (PDB ID 1AQ6), DehIVa from Burkholderia cepacia (PDB ID 2NO4),

BT4131 from Bacteroides thetaiotaomicron (PDB ID 1YMQ), Trehalose-6-phosphate

phosphatase from Thermoplasma acidophilum (PDB ID 1U02), sucrose-phosphatase

from Synechocystis sp. PCC 6803 (PDB ID 1TJ3), and Fcp1 from Schizosaccharomyces

pombe (PDB ID 3EF0) while chain B was used for Scp1 from Homo sapiens (PDB ID

2GHQ). The specific chain used was determined based on the quality of alignment of

proteins within each subgroup. Chain A for the SG proteins was mostly used except that

chain C of HAD/COF-like hydrolase from Plasmodium vivax (PDB ID 2B30) was used.

For the alignments shown in Table 5, the corresponding normalized SALSA

scores are shown in Table 6. The SALSA scores are normalized so that a perfect local

alignment of residues in the consensus signature positions has a score of 1. Note the first

eight rows and columns, with three blocks along the diagonal of Table 6, showing the

scores of the proteins of known function against each other. The diagonal blocks have

high, positive scores in the 0.793 – 1 range, indicating a good match of residues at the

Page 23: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

17

local active site. The off-diagonal blocks in the first eight rows and columns have

negative scores, indicating a poor match at the local active site. These scores for the

proteins of known function against each other serve to guide the interpretation of the

scores for the SG proteins. The next 21 rows are the SG proteins. Five of these have high,

positive scores against one set of proteins of known function and either negative scores or

low, positive scores (~0.01) against the other two sets of proteins of known function. The

other 16 SG proteins do not show good matching scores against the three sets of proteins

of known function. Table 7 shows the alignment of the predicted residues for the five SG

proteins that were predicted to have the function of either L-2 haloacid dehalogenase or

sugar phosphatase.

3.1. L-2 haloacid dehalogenase

Three representative proteins belong to the L-2 haloacid dehalogenases from

Pseudomonas sp. YL (PDB ID 1JUD), Xanthobacter autrophicus (PDB ID 1AQ6), and

Burkholderia cepacia (PDB ID 2NO4) and were chosen for the study. The structural

alignments of these proteins are shown in Figure 3. L-2 haloacid dehalogenases catalyze

the reaction of L-2 haloalkanoic acids to D-2 hydroxyalkanoic acids29

. The core domain

of these dehalogenases consists of an inserted cap domain of a 4-helix bundle at the

interface. Arginine at position 6 (Table 4 and 5; R41 in 1JUD) of L-2 haloacid

dehalogenases is found to situate away from the substrate at the bottom of the cleft

between the core and cap domains and is assumed to assist in constructing the active site

of the enzyme. The catalysis of L-2 haloacid dehalogenase occurs by the aspartate at

position 1 (D10 in 1JUD), which makes a nucleophilic attack on the substrate to form an

ester intermediate29

. Serine (position 9; S118 in 1JUD) hydrogen-bonds and orients the

Page 24: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

18

aspartate of position 1 (D10 in 1JUD) for the substrate binding. Tyrosine at position 3

(Y12 in 1JUD) is predicted to abstract the halide ion from the substrate. Asparagine at

position 21 (N177 in 1JUD) and aspartate at position 23 (D180 in 1JUD) are believed to

either hydrogen-bond and activate a water molecule so that it can attack the carbonyl

carbon atom of the intermediate, or to situate this water molecule so that it can hydrogen-

bond with aspartate at position 1 (D10 in 1JUD). The position or activation of this water

molecule is mediated by aspartate at position 23 (D180 in 1JUD) via hydrogen bonding

between its oxygen atoms and lysine of position 15 (K151 in 1JUD) and tyrosine of

position 18 (Y157 in 1JUD). This water also binds to the substrate along with serine

(position 9, S118 in 1JUD) through hydrogen bonding. Another water molecule is

presumably activated by either lysine (position 15; K151 in 1JUD) or threonine (position

5; T14 in 1JUD) for the attack on the intermediate.

A B

Figure 3: A). Structural alignment and B) Functional residue alignment of L-2 haloacid

dehalogenases. Shown in yellow is Pseudomonas sp. YL L-2 haloacid dehalogenase

(PDB ID 1JUD), cyan Xanthobacter autrophicus (PDB ID 1AQ6), and red Burkholderia

cepacia (PDB ID 2NO4). The alignment was made using YASARA software.

Page 25: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

19

3.2. Sugar phosphatase

For the sugar phosphatase subgroup, the three proteins studied were BT4131 from

Bacteroides thetaiotaomicron, T6PP from Thermoplasma acidophilum, and SPP from

Synechocystis sp. PCC 6803 with the PDB ID of 1YMQ, 1U02, and 1TJ3, respectively.

The structural alignments of sugar phosphatases are shown in Figure 4. These proteins

also consist of a cap domain of a 4-stranded beta-sheet with two helices30

. The catalysis is

carried by nucleophilic attack of the aspartate at position 1 (Table 4; D8 in 1YMQ) on the

substrate’s phosphorous atom to form a covalent intermediate. A water molecule

stabilizes the intermediate. Lysine at position 16 (K188 in 1YMQ) interacts with the

aspartate at position 3 (D10 in 1YMQ) to stabilize the charge on the aspartate. Aspartate

at position 3 (D10 in 1YMQ) acts as a general acid donating its proton to one of the

oxygen atoms. Arginine at position 11 (R45 in 1YMQ) forms a salt bridge with the

aspartate at position 3 (D10 in 1YMQ). Threonine at position 9 (T43 in 1YMQ) interacts

with the phosphoryl oxygen of the substrate. Aspartates at position 1, 3 and 20 (D8, D10,

D211, respectively, in 1YMQ) coordinate the metal ion. Aspartate at position 21 (G212

in in 1YMQ) and 23 (D215 in 1YMQ) interact with a water molecule that coordinates the

metal ion and lysine (position 16; K188 in 1YMQ).

Page 26: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

20

A B

Figure 4: A) Structural alignment and B). Functional residues alignment of sugar

phosphatases. Shown in green is Bacteroides thetaiotaomicron sugar phosphatase (PDB

ID 1YMQ), magenta Thermoplasma acidophilum (PDB ID 1U02), and blue

Synechocystis ps. PCC 6803 (PDB ID 1TJ3). The alignment was made using YASARA

software.

3.3. C-terminal domain phosphatase

Two representative proteins were chosen for the study of the C-terminal domain

phosphatase subgroup. They were Fcp1 from Schizosaccharomyces pombe and Scp1

from Homo sapiens with the PDB IDs of 3EF0 and 2GHQ, respectively. The structural

alignments of these proteins are shown in Figure 5. These proteins carry out hydrolysis

via the nucleophilic attack of the aspartate at position 1 (D170 in 3EF0) to transfer the

phosphoryl group31

. Aspartate at position 19 (D197 in 3EF0) acts as a general base

activating a water molecule for the reaction. Lysine in position 17 (K280 in 3EF0) forms

a salt bridge with the aspartate in position 19 D197 in 3EF0). This aspartate (position 19)

and aspartate at position 3 (D172 in 3EF0) activate the water molecule for the hydrolysis.

Tyrosine at position 13 (Y249 in 3EF0) forms a hydrogen bond with the aspartate of

position 3 (D172 in 3EF0). Arginine at position 14 (R271 in 3EF0) hydrogen bonds with

Page 27: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

21

Serine2 and Threonine4 (2 and 4 are sequence numbers) and is found to be important for

the binding of the substrate and the C-terminal domain31

.

A B

Figure 5: A) Structural alignment and B). Functional residues alignment of C-terminal

domain phosphatase. Shown in yellow is Schizosaccharomyces pombe C-terminal

domain phosphatase (PDB ID 3EF0) and green Homo sapiens (PDB ID 2GHQ). The

alignment was made using YASARA software.

IV. Discussion

The Consensus Signatures for each of the three known functional types was

obtained from the POOL scores and from literature information about the individual

representative proteins, followed by alignment to the other representative proteins within

the same functional subgroup. All of the representative proteins were aligned to each

other for the overall spatial alignment as shown in Table 4. The alignments of the SG

proteins to the representative proteins of three subgroups indicated that one of the SG

proteins is likely to have the function of the L-2 haloacid dehalogenase and four are

likely to be sugar phosphatases. As shown in Table 7, the predicted active site residues in

chain A of RSc1362 from Ralstonia solanacearum (PDB ID 3UMB) align spatially well

with those of all of the Consensus Signature positions of L-2 haloacid dehalogenase. The

SALSA score of this SG protein to the three known L-2 haloacid dehalogenases is 0.74,

Page 28: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

22

suggesting that the putative annotation of L-2 halogenase dehalogenase is likely to be

correct.

Four SG proteins, HAD/COF-like hydrolase from Plasmodium vivax (PDB ID

2B30), hypothetical protein from Geobacillus kaustophilus (PDB ID 2PQ0), putative

phosphatase from Eubacterium Rectale (PDB 3DAO), and haloacid dehalogenase-like

hydrolase from Bacteroides thetaiotaomicron (PDB ID 3NIW) were shown to align with

high SALSA scores in the 0.56 – 0.94 range to the sugar phosphatases. There is some

variability in the active sites among the sugar phosphatases, perhaps because of

differences in substrate preferences.

The residues of chain C of HAD/COF-like hydrolase from Plasmodium vivax

(PDB ID 2B30) spatially align to all the CS residues of the known sugar phosphatases

except for A249. Notice in column 21 that A249 is aligned with four different types of

residues. However, A249 does not match any of the corresponding residues for the

proteins of known function (G/D/S) at that spatial position. All of the residues of this SG

protein were predicted by POOL to be important for catalysis. The SALSA scores of

HAD/COF-like hydrolase are 0.70, 0.61, and 0.65 aligned to BT4131 from Bacteroides

thetaiotaomicron (PDB ID 1YMQ), trehalose-6-phosphate phosphatase from

Thermoplasma acidophilum (PDB ID 1U02), and sucrose-phosphatase from

Synechocystis sp. PCC 6803 (PDB ID 1TJ3), respectively. The average of these scores

was ~0.65.

Residues of chain A of hypothetical protein from Geobacillus kaustophilus (PDB

ID 2PQ0) aligned perfectly to CS of the sugar phosphatases, not counting the variable

Page 29: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

23

position 21. G208 of 2PQ0 (position 21) aligned with G212 of 1YMQ(A), but not the

others in the same column. The SALSA scores of this SG protein are 0.94, 0.79, and 0.82

to those of BT4131 from Bacteroides thetaiotaomicron (PDB ID 1YMQ), trehalose-6-

phosphate phosphatase from Thermoplasma acidophilum (PDB ID 1U02), and sucrose-

phosphatase from Synechocystis sp. PCC 6803 (PDB ID 1TJ3), respectively. The average

score of 2PQ0 was ~0.85. This score is in the same range as those of the known sugar

phosphatases with each other.

The residues of chain A of putative phosphatase from Eubacterium rectale (PDB

3DAO), were mostly predicted by POOL to be significant except that of spatially aligned

position, g11. S43 (position 9) and N217 (position 21) did not align to the CS of any

sugar phosphatases. SALSA scoring method provides the scores of 0.64, 0.59, and 0.60

to the BT4131 from Bacteroides thetaiotaomicron (PDB ID 1YMQ), trehalose-6-

phosphate phosphatase from Thermoplasma acidophilum (PDB ID 1U02), and sucrose-

phosphatase from Synechocystis sp. PCC 6803 (PDB ID 1TJ3), respectively, with an

average of 0.61.

The CS residues of haloacid dehalogenase-like hydrolase Chain A from

Bacteroides thetaiotaomicron (PDB ID 3NIW) overlap with all the CS positions of the

sugar phosphatases except that of S43 (position 9); this position has a (chemically

similar) T for the known sugar phosphatases. Also, residue g220 at position 21 was not

predicted by the POOL method; note that this position is variable in the known sugar

phosphatases. The scores obtained from SALSA were 0.71, 0.56, and 0.58 aligning to

BT4131 from Bacteroides thetaiotaomicron (PDB ID 1YMQ), trehalose-6-phosphate

Page 30: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

24

phosphatase from Thermoplasma acidophilum (PDB ID 1U02), and sucrose-phosphatase

from Synechocystis sp. PCC 6803 (PDB ID 1TJ3), respectively; these average to be 0.62.

Alignments of the other 15 SG proteins as well as their SALSA scores indicate that they

do not possess the function of L-2 haloacid dehalogenase, sugar phosphatase, or C-

terminal domain phosphatase.

V. Conclusions

Based on the alignment to the representative proteins and the SALSA scores,

RSc1362 from Ralstonia solanacearum is likely to function as L-2 haloacid dehalogenase

while HAD/COF-like hydrolase from Plasmodium vivax, hypothetical protein from

Geobacillus kaustophilus, putative phosphate from Eubacterium Rectale, and haloacid

dehalogenase-like hydrolase from Bacteroides thetaiotaomicron are likely to function as

sugar phosphatases. Docking studies could help to identify the likely native substrate for

each of the five proteins for which function could be assigned.

In some cases, although it is not possible at this time to predict the biochemical

function, it can be established that certain pairs of SG proteins have functions similar to

each other. For example, the putative nagD protein from Thermotoga maritima with PDB

ID 1PW5 and the hypothetical protein from Enterococcus faecalis with PDB ID 1YV9

have a SALSA similarity score of 0.54, suggesting that these two proteins have similar

function. High scores are observed for many pairs of SG proteins (Table 6). The highest

SALSA score between the SG proteins is 1. These SG proteins are both from

Pseudomonas fluorescens Pf-5. The two structures were reported by the same Structural

Page 31: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

25

Genomics group and have different protein names and different PDB IDs, 2YBD and

3R09, but they are in fact the same protein with 100% sequence identity.

The results from this study are subject to experiments to confirm the

computational predictions of functional importance of residues as well as the function of

the SG proteins. From this study, three subgroups of the HAD superfamily are analyzed

and five SG proteins are predicted to have the functions of either L-2 haloacid

dehalogenase or sugar phosphatases. It has been found that none of the studied SG

proteins is likely to possess the function of C-terminal domain phosphatase. There are

still many proteins in the HAD superfamily, of both known and unknown function, that

are required to be studied since HAD is a large superfamily of proteins. Some of the SG

proteins, if functionally classified, may have many potential applications in the

bioremediation of the soil and the ground water in the United States.

References

[1] Olaniran, A., Pillay, D., and Pillay, B. (2004) Haloalkane and haloacid dehalogenases

from aerobic bacterial isolates indigenous to contaminated sites in Africa demonstrate

diverse substrate specificities, Chemosphere 55, 27-33.

[2] Russell, H. H., Matthews, J. E., and Guy, W. S. (1992) TCE removal from

contaminated soil and groundwater, EPA Environmental Engineering Sourcebook.

[3] Doherty, R. E. (2000) A History of the Production and Use of Carbon Tetrachloride,

Tetrachloroethylene, Trichloroethylene and 1, 1, 1-Trichloroethane in the United States:

Part 1--Historical Background; Carbon Tetrachloride and Tetrachloroethylene,

Environmental Forensics 1, 69-81.

[4] Mcnab, W. W., Ruiz, R., and Reinhard, M. (2000) In-situ destruction of chlorinated

hydrocarbons in groundwater using catalytic reductive dehalogenation in a reactive well:

Testing and operational experiences, Environmental Science and Technology 34, 149-

153.

Page 32: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

26

[5] Burroughs, A. M., Allen, K. N., Dunaway-Mariano, D., and Aravind, L. (2006)

Evolutionary genomics of the HAD superfamily: understanding the structural adaptations

and catalytic diversity in a superfamily of phosphoesterases and allied enzymes, Journal

of Molecular Biology 361, 1003-1034.

[6] Lahiri, S. D., Zhang, G., Dunaway-Mariano, D., and Allen, K. N. (2006)

Diversification of function in the haloacid dehalogenase enzyme superfamily: The role of

the cap domain in hydrolytic phosphoruscarbon bond cleavage, Bioorganic Chemistry 34,

394-409.

[7] Peisach, E., Selengut, J. D., Dunaway-Mariano, D., and Allen, K. N. (2004) X-ray

crystal structure of the hypothetical phosphotyrosine phosphatase MDP-1 of the haloacid

dehalogenase superfamily, Biochemistry 43, 12770-12779.

[8] Parsons, J. F., Lim, K., Tempczyk, A., Krajewski, W., Eisenstein, E., and Herzberg,

O. (2002) From structure to function: YrbI from Haemophilus influenzae (HI1679) is a

phosphatase, Proteins: Structure, Function, and Bioinformatics 46, 393-404.

[9] Krieger, E., and Vriend, G. (2002) Models@ Home: distributed computing in

bioinformatics using a screensaver based approach, Bioinformatics 18, 315-318.

[10] Ridder, I., and Dijkstra, B. (1999) Identification of the Mg2+-binding site in the P-

type ATPase and phosphatase members of the HAD (haloacid dehalogenase) superfamily

by structural similarity to the response regulator protein CheY, Biochemistry Journal

339, 223-226.

[11] Baker, D., and Sali, A. (2001) Protein structure prediction and structural genomics,

Science 294, 93-96.

[12] Lopez, G., Rojas, A., Tress, M., and Valencia, A. (2007) Assessment of predictions

submitted for the CASP7 function prediction category, Proteins: Structure, Function, and

Bioinformatics 69, 165-174.

[13] Tong, W., Wei, Y., Murga, L. F., Ondrechen, M. J., and Williams, R. J. (2009)

Partial order optimum likelihood (POOL): maximum likelihood prediction of protein

active site residues using 3D Structure and sequence properties, PLoS Computational

Biology 5, e1000266.

[14] Ondrechen, M. J., Clifton, J. G., and Ringe, D. (2001) THEMATICS: a simple

computational predictor of enzyme function from structure, Proceedings of the National

Academy of Sciences 98, 12473-12478.

[15] Capra, J. A., Laskowski, R. A., Thornton, J. M., Singh, M., and Funkhouser, T. A.

(2009) Predicting protein ligand binding sites by combining evolutionary sequence

conservation and 3D structure, PLoS Computational Biology 5, e1000585.

Page 33: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

27

[16] Sankararaman, S., and Sjölander, K. (2008) INTREPID—INformation-theoretic

TREe traversal for Protein functional site IDentification, Bioinformatics 24, 2445-2452.

[17] Madura, J. D., Briggs, J. M., Wade, R. C., Davis, M. E., Luty, B. A., Ilin, A.,

Antosiewicz, J., Gilson, M. K., Bagheri, B., and Scott, L. R. (1995) Electrostatics and

diffusion of molecules in solution: simulations with the University of Houston Brownian

Dynamics program, Computer Physics Communications 91, 57-95.

[18] Gilson, M. K. (1993) Multiple‐site titration and molecular modeling: Two rapid

methods for computing energies and forces for ionizable groups in proteins, Proteins:

Structure, Function, and Bioinformatics 15, 266-282.

[19] Majtey, A., Lamberti, P., and Prato, D. (2005) Jensen-Shannon divergence as a

measure of distinguishability between mixed quantum states, Physical Review A 72,

052310.

[20] Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H.,

Shindyalov, I. N., and Bourne, P. E. (2000) The protein data bank, Nucleic Acids

Research 28, 235-242.

[21] Wang, Z., Yin, P., Lee, J. S., Parasuram, R., Somarowthu, S., and Ondrechen, M. J.

(2013) Protein function annotation with Structurally Aligned Local Sites of Activity

(SALSAs), BMC Bioinformatics 14, S13.

[22] de Beer, T. A., Berka, K., Thornton, J. M., and Laskowski, R. A. (2014) PDBsum

additions, Nucleic Acids Research 42, D292-D296.

[23] Krissinel, E., and Henrick, K. (2004) Secondary-structure matching (SSM), a new

tool for fast protein structure alignment in three dimensions, Acta Crystallographica

Section D: Biological Crystallography 60, 2256-2268.

[24] Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M.,

Meng, E. C., and Ferrin, T. E. (2004) UCSF Chimera—a visualization system for

exploratory research and analysis, Journal of Computational Chemistry 25, 1605-1612.

[25] Holm, L., and Rosenström, P. (2010) Dali server: conservation mapping in 3D,

Nucleic acids research 38, W545-W549.

[26] Eddy, S. R. (2004) Where did the BLOSUM62 alignment score matrix come from?,

Nature Biotechnology 22, 1035-1036.

[27] Murzin, A. G., Brenner, S. E., Hubbard, T., and Chothia, C. (1995) SCOP: a

structural classification of proteins database for the investigation of sequences and

structures, Journal of Molecular Biology 247, 536-540.

Page 34: Analysis of Haloacid Dehalogenase superfamily members and functional assignment …... · 2019-02-13 · Analysis of Haloacid Dehalogenase superfamily members and functional assignment

28

[28] Akiva, E., Brown, S., Almonacid, D. E., Barber, A. E., Custer, A. F., Hicks, M. A.,

Huang, C. C., Lauck, F., Mashiyama, S. T., and Meng, E. C. (2013) The Structure–Function Linkage Database, Nucleic Acids Research, gkt1130.

[29] Hisano, T., Hata, Y., Fujii, T., Liu, J.-Q., Kurihara, T., Esaki, N., and Soda, K.

(1996) Crystal Structure of L-2-Haloacid Dehalogenase from Pseudomonas sp. YL AN

α/β HYDROLASE STRUCTURE THAT IS DIFFERENT FROM THE α/β HYDROLASE FOLD, Journal of Biological Chemistry 271, 20322-20330.

[30] Rao, K. N., Kumaran, D., Seetharaman, J., Bonanno, J. B., Burley, S. K., and

Swaminathan, S. (2006) Crystal structure of trehalose‐6‐phosphate phosphatase–related

protein: Biochemical and biological implications, Protein Science 15, 1735-1744.

[31] Zhang, Y., Kim, Y., Genoud, N., Gao, J., Kelly, J. W., Pfaff, S. L., Gill, G. N.,

Dixon, J. E., and Noel, J. P. (2006) Determinants for dephosphorylation of the RNA

polymerase II C-terminal domain by Scp1, Molecular Cell 24, 759-770.