Molecular Modelling of a Thermostable Glycoside Hydrolase from Caldivirga m aquilingensis and Its Substrate Docking Mechanism for Galactooligosaccharides Synthesis

Glycoside hydrolases (GHs) are very important enzymes that can catalyze the breakdown of the glycosidic bonds between carbohydrates and non-carbohydrates to synthesize GOS prebiotic sugars and hydrolyze lactose in the dairy industry. GOS can stimulate the growth of gut microbiota to have beneficial health effects in the host. The current work investigated molecular modelling and lactose substrate docking of a thermostable GH family 1 from Caldivirga maquilingensis strain IC-167 and determined its mechanism for GOS synthesis. The 3D model structure obtained from the Swiss model analysis tool revealed that GH 1 from Caldivirga maquilingensis was a tetramer with a catalytic pocket at the center of each monomer. The overall quality of the model was 93%. When lactose was docked in the catalytic site using AutoDock Vina software, the catalytic amino acid residues were identified to be Glu 387 and Glu 209 which acted as nucleophile and acid/base residues respectively. Other amino acid residues like His 153, Gln 19, Glu 432, Ala 266, Asn 267, Ser 268 and Trp 433 were also found to be surrounding the catalytic site and playing an essential role in ligand binding and recognition of the lactose substrate for GOS synthesis. These findings offer an understanding of how enzyme protein structure determines catalytic specificity, which serves as new knowledge basis to engineer GH 1 from C. maquilingensis for the biosynthesis of GOS with a broad or narrow degree of polymerization range.


Introduction
Production of non-digestible galacto-oligosaccharides (GOS) has continued to gain importance because of their recognition as prebiotic compounds that stimulate growth of bifidobacteria and lactobacilli bacteria associated with beneficial health effects and also as an interesting functional food ingredient [1,2]. Furthermore, GOS are of special interest because they have structurally related oligosaccharides present in human breast milk [3,4]. Because of these benefits, GOS are of great interest for both human and animal nutrition.
Enzymatic synthesis of GOS from lactose is normally conducted using glycoside hydrolases (GH, EC 3.2.1.X) because these enzymes can transfer glycosyl moieties from a donor sugar to an acceptor [5,6]. This method is the one currently used in the dairy industry for GOS synthesis from lactose with the yield of the reaction depending on the relative ratio of the transgalactosylation versus hydrolysis reaction [7,8]. In the reaction for lactose hydrolysis, a galactosyl-enzyme intermediate is formed and instead of transferring the galactose unit to water for hydrolysis reaction, the enzyme transfers it to any suitable nucleophile acceptor sugar present in the reaction system to synthesize various GOS. As a result, GOS are very complex mixtures consisting of numerous and different oligosaccharides varying in their degree of polymerization (DP), glycosidic linkages and anomeric configurations. In general terms, enzymatic hydrolysis of the glycosidic bond takes place via general acid catalysis that requires two critical residues: a proton donor and a nucleophile/base [9,10].
Glycoside hydrolases (GHs) with β-galactosidase activity occur in a variety of microorganisms from the super kingdoms bacteria, archaea and eukaryota. Some of these enzymes have been expressed in host organisms by several conventional techniques. Due to their superior stability properties, glycoside hydrolases from thermophilic microorganisms have become an attractive alternative to the non-thermostable enzymes because they can allow high substrate concentrations to be used due to increased substrate solubility leading to increased GOS yields [11,12]. Reaction yield and specificity of β-galactosidases by the donor and acceptor of galactose depends on the enzyme origin due to differences in its structure and/or in the coupling mechanism of the sugars, leading to different distributions of products of synthesis [13,14].Tertiary structure and amino acids in the active site affect β-galactosidase reaction specificity, being both important since the ability of the enzyme to accept nucleophiles other than water in its active center will determine the proper transfer of the galactose moiety [9]. Specificity will also be affected by the concentrations of the donor and acceptor of the transgalactosylated galactose [11,12].
The number of known protein sequences has currently increased following the publication of the human genome and other sequencing projects and deposited in the gene bank hence the increase in the number of protein sequences far exceeds the number of known protein structures [15]. As such the three dimensional (3D) structure modelling of enzymes is a very important tool used to reveal the identity of important amino acid residues involved in catalysis and to elucidate the mechanism in the enzymatic synthesis of oligosaccharides [16]. It is largely known that the substrate specificity and the mode of action of GHs are driven by details of their three dimensional structures rather than by their global fold [17]. Homology modeling is currently routinely used in many practical applications to generate reliable 3D protein structure models from the amino acid sequences. The structural information from the theoretically modelled complex may help us to understand the binding mode between enzymes and substrates leading to an enhanced designing and engineering of novel enzymes with desired structure and functionality for diverse applications.
In our previous study, a gene encoding for a glycoside hydrolase family 1 (GH1) enzyme from the hyperthermophilic Caldivirga maquilingensis strain IC-167 was successfully cloned and expressed in Escherichia coli then purified to homogeneity. Its biochemical characterization revealed good substrate specificities towards lactose and other synthetic substrates as well as being significantly thermostable which suggested its potential for oligosaccharides synthesis [18]. Currently, only 13 crystal X-ray structures of glycoside hydrolase β-galactosidases have been elucidated and included in the Protein Data Bank, of which only one belongs to GH 1 family [5]. Therefore in the current study we report on the homology molecular modelling and docking of the substrate into the active site of the GH1 enzyme from the hyperthermophilic C. maquilingensis to understand the enzyme-substrate interaction in relation to GOS synthesis, by using various homologies, docking and verification software.

Protein Homology Modelling
The amino acid sequence for the putative gene from C. maquilingensis IC-167 which encoded for a hypothetical GH1 enzyme has already been deposited in the GenBank (NCBI accession number: ABW01253.1). This amino acid sequence was submitted to Swiss-model web server to predict a three dimensional (3D) structure of the GH 1 from C. maquilingensis against any available template protein structure with the highest amino acid sequence similarity to that of GH 1 from C. maquilingensis in the protein data bank (PDB). The options for the submitted sequence for model construction were selected to be either monomer or tetramer enzyme structure. The generated 3D model structure after searching and comparison with an appropriate temple with highest protein structure similarity to the GH1 in the Swiss model server was retrieved from the PDB.

Three Dimensional Structure Verification
The modeled structure revealed from Swiss model was verified through various verification methods. The structure was checked through PROCHEK to calculate the favored regions, additional allowed regions, generously allowed regions and disallowed regions to predict the protein fold quality. Furthermore, the structure was verified using Errat, to calculate the error values based on the statistics of non-bonded atom-atom interactions in the structure in comparison to a database of reliable highresolution structures. In addition the structure was checked through Verify 3d to determine the compatibility of an atomic model (3D) with its own amino acid sequence by assigning a structural class based on its location and environment (alpha, beta, loop, polar, nonpolar etc) and comparing the results to good structures.

Substrate Docking Analysis
The 3D monomer structure generated from Swiss model was then used in the substrate ligand (lactose) docking studies. The docking simulation studies were performed by AutoDock Vina version 5.6. The 3D monomer structure of GH 1 from C. maquilingensis and ligand structure were at first processed using AutoDock software tool with the docking space confined in a grid box centered in the active site of the C. maquilingensis GH1 enzyme. Then a PDBQT file was created for vina software. The low energy pose for best ligand-protein docked model was selected according to results of Vina. The generated output binding mode of the ligand-protein complex was analyzed by Accelary Discovery studio and Chimera programs to understand and observe the interaction of the ligand and enzyme active site amino acid molecules at an atomic level.

Results and Discussion
Three-dimensional structural studies are essential to reveal the identity of important amino acid residues involved in catalysis and to elucidate the mechanism in the enzymatic synthesis of oligosaccharides. It is largely known that the substrate specificity and the mode of action of GHs are driven by details of their three dimensional structures rather than by their global fold [17]. Regarding microbial β-galactosidases, only 13 crystal structures have been elucidated to date and included in the Protein Data Bank [5]. The structures achieved by homology modeling are said to be similar to those of nuclear magnetic resonance (NMR) spectroscopy or X-ray crystallography [15,19].
When the GH 1 from C. maquilingensis amino acid sequence was submitted to the swiss model server to search for a template protein structure with the highest similarity to the target, the most similar template structure was a β-glucosidase from Sulfolobus Solfataricus (PDB: 5ixe.4 ), with 74.69% identity as shown on Figure 1. This result is in agreement with our previous study where we reported GH 1 from C. maquilingensis to have 100% β-galactosidase, 50% β-glucosidase and 25% xylosidase enzyme activities and showed a 73% amino acid identity similarity to that of S. solfataricus, which implied that these two microorganisms are evolutionary related [18].
Glycoside hydrolases (GH) may be classified in families based on sequence similarities. Since there is a relationship between folding and sequence similarities, glycoside hydrolases from the same family have similar spatial structures [9,20]. Therefore, this classification facilitates the identification of amino acid residues presenting homologous function in glycoside hydrolases from the same family. Glycoside hydrolases with β-galactosidase (EC 3.2.1.23) activity occur in a variety of microorganisms from the super kingdoms bacteria, archaea and eukaryote and have been categorized within the GH families 1, 2, 35 and 42. GH families 1, 2 and 35 contain enzymes other than β-galactosidases, while only β-galactosidase has been classified into GH 42 [14].
The predicted 3 D model for GH 1 from C. maquilingensis was a homo-tetramer with four identical monomer subunits forming a symmetric unit as shown on Figure 3. The overall quality of the predicted 3D model was assessed with PROCHECK, Verify 3d and Errat service tools as shown on Figure 2.  The PROCHECK Ramachandran plot statistics for the GH 1 from C. maquilingensis amino acid residues structure atoms revealed that 91.9%, 7.5%, 0.1% and 0.5% atoms resided in the most favored regions, additional allowed regions, generously allowed regions and disallowed regions respectively (Figure 2 (a)). The combined percentage of 99.9 % for all atoms residing in the most favored and allowed regions indicate that there was a good protein fold for the GH 1 from C. maquilingensis structure. When doing the molecular modelling of a crystal x-ray structure of an acidic βgalactosidase from Aspergillus oryzae, Maksimainen et al. [20] reported that validation of their model showed 87.8% of the residues in the most favored regions, 11.5% in additional allowed regions, 0.4% in generously allowed regions, and 0.3% in disallowed regions of the Ramachandran plot, which was similar to the results of our model for the GH 1 from C. maquilingensis. The results of PROCHECK for fold validation of our structure further indicated that a high quality GH 1 from C. maquilingensis 3D model structure was successfully derived from Sulfolobus Solfataricus (PDB: 5ixe.4 ) βglucosidase template in terms of protein folding. The overall quality of the GH 1 from C. maquilingensis structure as displayed by Errat was 93.126% as shown on Figure 2 (b). Further validation of the structure by Verify 3d revealed that 96.70% of the residues had an averaged 3D-1D score >= 0.2 for GH 1 from C. maquilingensis.
As already mentioned above, the predicted 3 D model for GH 1 from C. maquilingensis was a homo-tetramer with four identical monomer subunits with half of them forming an asymmetric unit as shown on Figure 3 (a).
The enzyme monomer subunit molecular mass was previously determined by SDS-PAGE at 56.4 kDa [18]. The crystallographic dyad in the tetramer relates the yellow and cyan colored monomers to the blue and red monomers such that it lies flattened perpendicular to one of the 2-fold axes so that each monomer lies at the corner of a slightly puckered square and only contacts two other monomers in the tetramer. The structure forms a typical (βα) 8 barrel fold with the β-strands and α-helices in each repeat but with substantial long connecting loops in between them. This kind of fold has been observed before and is usually conserved for the family-1 β-glycoside hydrolases [21]. The active site for the GH 1 from C. maquilingensis occurs at the center of the top face of the barrel C-terminal portion and is surrounded by loops connecting the α-helix to the β-strands (Figure 4 (a)). The active site is connected to the surface by a radial tunnel which ends at the center "pocket" of the monomer ( Figure  4 (b)), and probably acts as the binding site for extended oligosaccharide substrates. At the active site center of the monomer, there are two access openings or 'mouths' connected to the tunnel catalytic site.  Molecular docking can fit two molecules together to obtain an energetically favorable conformation for the complex [22]. The structural information from the theoretically modelled complex are essential to help us to understand and reveal the identity of important amino acid residues involved in catalysis and to elucidate the mechanism in the enzymatic synthesis of oligosaccharides. It has been reported that the substrate specificity and the mode of action of GHs are driven by details of their three dimensional structures rather than by their global fold hence structural analyses and site-directed mutagenesis studies carried out in GHs have revealed that linkage specificity is controlled by the topology of only a few key amino acids located at the enzyme active site [23,24]. Therefore GH 1 from C. maquilingensis was docked with lactose substrate as shown on Figure 5.
On the basis of sequence alignment, we identified the active site catalytic amino acid residues in GH 1 from C. maquilingensis as glutamic acid 387 (Glu387), glutamic acid 209 (Glu209) as well as histidine 153 (His 153) with respect to the already determined x-ray crystal structure of Sulfolobus Solfataricus [21]. These residues are located in a pocket found at one side of the TIM barrel domain, in the center of each monomer. The glycoside hydrolases active site is divided into several subsites with enough size to bind a monosaccharide unit. As shown in Figure 5, the subsite that binds the galactose moiety of the docked lactose is denominated subsite -1 (or glycone subsite) whereas the remaining part of the substrate, i.e. glucose moiety is accommodated in the aglycone binding region, which may be formed by several subsites (+1, +2, +3 and so on) depending on the degree of polymerization of the acceptor oligosaccharide substrate. Substrate cleavage point is between subsite -1 (glycone subsite) and +1 (aglycone binding region) [9]. The active site contains a high concentration of residues which are highly conserved in all family-1 glycoside hydrolases. In general terms, enzymatic hydrolysis of the glycosidic bond takes place via general acid catalysis that requires two critical residues: a proton donor and a nucleophile/base [17]. As shown on Figure 5, the interactions of lactose with the binding pocket of GH 1 from C. maquilingensis were analyzed using molecular docking tools. The docking of lactose ligand was observed and the interaction with the lowest free energy was chosen as the best orientation position for the enzyme-ligand interaction by the catalytic residues. As shown in the figure, Glu387 and Glu209 are the nucleophile and acid/base catalyst for GH 1 from C. maquilingensis, respectively. The nucleophile is responsible for formation of a covalent galactosyl-enzyme intermediate while the acid/base is a proton donor. Briefly the proposed catalysis proceeds via a double-displacement reaction involving two steps of hydrolysis/glycosylation and transglycosylation. In the first step, the enzyme is glycosylated by the concerted action of the two carboxylate residues, Glu387 and Glu209, in which a covalent galactosyl-enzyme intermediate is formed and hydrolyzed via oxocarbenium ion-like transition states, while glucose is being released. The second mechanistic step involves the attack of the covalent galactosyl-enzyme intermediate by a water molecule (hydrolysis) or a sugar molecule (transgalactosylation), followed by the transfer of a proton from a water/sugar to the proton donor, in a reverse mode of the first step [25]. The attacking from another sugar molecule results in synthesis of galactooligosaccharides by transgalctosylation whereas a water molecule attack results in hydrolysis products glucose and galactose. It is worth noting that other residues like, alanine (Ala 266), asparagine (Asp267), serine (Ser268), Glutamic acid (Glu432), tryptophan (Trp433) and glutamine (Gln19) found surrounding the catalytic site are within hydrogen bonding distances with the glycone and aglycone playing essential roles in ligand binding and recognition of the lactose molecules and in selecting different acceptor molecules during transgalactosylation unique to this enzyme. In fact the hydroxyl, polar and aromatic side chains found in some of these residues have been reported to provide flexibility and fine-tuning of the position of the nucleophile and also stabilizing the deprotonated state of the substrate-enzyme intermediate which is beneficial for both transgalactosylation and hydrolysis [20,26]. Therefore, the orientation and establishment of bonds between the substrate and both aromatic and polar amino acids define the relevant enzymatic features, such as the affinity, specificity, and activity. There are also a number of negatively charged and polar acidic glutamic acid (Glu) residues around the active site area implying that this enzyme has a good structural basis to act in acidic pH range. Indeed, previously the activity analysis for the GH 1 from C. maquilingensis showed that it had a pH optimum at about 4.5 [18].

Conclusion
Molecular docking simulation elucidated that GH 1 from C. maquilingensis has a tetrameric structure with an active site at the center of its monomer with a tunnel that is connected to the catalytic "pocket". The structural data was used to identify amino acid residues involved in substrate catalysis and having interactions with other residues surrounding the active site.
The findings offer an improved understanding of how enzyme protein structure determines catalytic specificity, which serves as new knowledge basis to engineer GH 1 from C. maquilingensis for the biosynthesis of GOS with a broad or narrow degree of polymerization range. Some of the known glycoside hydrolases are quite specific, whereas others tolerate some slight structural differences. The characterization of the molecular basis of glycoside hydrolases substrate preference may contribute to the comprehension of the enzymatic specificity, a fundamental property of biological systems. Therefore, not only structure but binding site topology are very important factors for understanding the mechanism of recognition of these proteins.