A Computational Vaccine Designing Approached for MERS-CoV Infections
Hiba Siddig Ibrahim1,, Shamsoun Khamis Kafi2
1The National Ribat University
2Dean of the faculty of medical laboratory science, The National Ribat University
Abstract | |
1. | Introduction |
2. | Materials & Methods |
3. | Results |
4. | Discussion |
5. | Conclusions |
Acknowledgements | |
Statement of Competing Interests | |
References |
Abstract
The emergence of a new novel coronavirus infections recently known as MERS-CoV, that characterize by quickly progressing disease with multiple organs failures, that’s resembles SARS-CoV outbreak in 2003-2004. MERS-CoV becomes a scientists and WHO objectives in order to try to stop pandemic infections by rapidly developing coronavirus vaccine; one of this techniques are epitope prediction vaccine by computational methods; in silico, because it can accelerate vaccine development process especially when the convention procedures they are difficult to be applicable, time -consuming, expensive and also need to approved by FDA. The aim of this study was to use IEDB software to predict the suitable MERS-CoV epitope vaccine against the most known world population alleles through four selecting proteins such as S glycoprotein, envelope protein and their modification sequences. The main aim of this study is the developing of MERS-CoV vaccine by using IEDB services as one of the computational methods; the output of this study showed that S glycoprotein, envelope (E) protein and S and E protein modified sequences of MERS-CoV might be considered as a protective immunogenic with high conservancy because they can elect both neutralizing antibodies and T-cell responses when reacting with B-cell, T- helper cell and Cytotoxic T-lymphocyte. A total numbers of B-cell epitopes represented 1, 3, 20 and 27 for E, modified E, S and modified S glycoprotein sequential but 18 epitopes were shared between S and modified S glycoprotein while for CTL were represented 63, 41, 602, 612 epitopes for E, modified E, S and modified S glycoprotein sequential and for T-helper cell they represented 685 epitopes for each of E and modified E proteins while they are 212 and 6896 epitopes for S and modified S glycoprotein sequential; NetCTL, NetChop and MHC-NP were used to confirm our results but still there are problems with most selected epitopes due to presence of arginine that hiding epitopes from recognition by immune system. Population coverage analysis showed that the putative helper T-cell epitopes and CTL epitopes could cover most of the world population in more than 60 geographical regions. According to AllerHunter results, all those selected different protein showed non- allergen, this finding makes this computational vaccine study more desirable for vaccine synthesis.
Keywords: Middle East Respiratory Syndrome Coronavirus, Severe Acute Respiratory Syndrome Coronavirus, Federal Drug Administration, Immuno Epitope Data Base, FAO, AllerHunter
Copyright © 2017 Science and Education Publishing. All Rights Reserved.Cite this article:
- Hiba Siddig Ibrahim, Shamsoun Khamis Kafi. A Computational Vaccine Designing Approached for MERS-CoV Infections. American Journal of Infectious Diseases and Microbiology. Vol. 5, No. 1, 2017, pp 4-60. https://pubs.sciepub.com/ajidm/5/1/2
- Ibrahim, Hiba Siddig, and Shamsoun Khamis Kafi. "A Computational Vaccine Designing Approached for MERS-CoV Infections." American Journal of Infectious Diseases and Microbiology 5.1 (2017): 4-60.
- Ibrahim, H. S. , & Kafi, S. K. (2017). A Computational Vaccine Designing Approached for MERS-CoV Infections. American Journal of Infectious Diseases and Microbiology, 5(1), 4-60.
- Ibrahim, Hiba Siddig, and Shamsoun Khamis Kafi. "A Computational Vaccine Designing Approached for MERS-CoV Infections." American Journal of Infectious Diseases and Microbiology 5, no. 1 (2017): 4-60.
Import into BibTeX | Import into EndNote | Import into RefMan | Import into RefWorks |
At a glance: Figures
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
Figure 28
Figure 29
Figure 30
Figure 31
Figure 32
Figure 33
Figure 34
Figure 35
Figure 36
Figure 37
Figure 38
1. Introduction
Vaccine development was considered as the most important subjects to protects from a highly infectious disease especially when treatment are not available, now a days a new way for vaccine design was done by a new aspects called immune-informatics that depends on software program to determine the most immunogenic parts of the organisms (epitopes) like these software’s that were used in this study to try to develop more powerful immunogenic MERS-CoV vaccine because the previous MERS-CoV vaccine can be either inactivated coronavirus, live attenuated coronavirus, S protein-based, DNA vaccines and combination vaccines against coronaviruses; as we know coronaviruses was first described in the 1960s from the nasal cavities of patients with common cold, These strain of coronaviruses were called HC-229E and HC-OC43; in 2003, following the outbreak of severe acute respiratory syndrome (SARS) that resulted in over 8,000 infections, about 10% of which resulted in death, but in 24 September 2012 a first report of isolated a new novel coronavirus like SARS-CoV by Egyptian virologist Dr. Ali Mohamed Zaki in Jeddah, Saudi Arabia, from the lungs of a 60-year-old male patient with acute pneumonia and acute renal failure becomes a new discovery that recently called MERS-CoV, this findings was posted on ProMED-mail [1, 2, 3]. MERS-CoV belong to group C β-coronaviruses that characterize by 30 KB genome, ssRNA virus, positive sense with 10 predicting open reading frames (ORFs) like E, M, S, enveloped. MERS-CoV can grows in a culture media; the genome size, organization and sequence analysis revealed that the NCoV is most closely related to bat coronaviruses BtCoV-HKU4 and BtCoV-HKU5; a partial Spike gene sequencing of South African Neoromicia bats was considered as close relative to MERS-Cov as illustrated by nucleotide percentage distance substitution model and the complete deletion option in MEGA, this make the possibility of a common coronavirus vaccine more desirables [3, 4, 5].
This study depended on using S, E with modified S and E protein sequences through in silico approach to develop MERS-CoV vaccine in addiction to study the side effects of mutation in those selected sequences on vaccine development. Spike glycoprotein is characterize by a trimeric, envelope-anchored, type I fusion glycoprotein that interfaces with human dipeptidyl peptidase 4 (DPP4) receptor, to mediate viral entry, it composed of 2 subunit, they are S1, which contains the receptor-binding domain and determines cell tropism; and S2, the location of the cell fusion machinery while E protein was considered as part of virus cell membrane [4, 6].
This study showed that S, E and their modified sequences can be considered safe and most promising MERS-CoV vaccine without any kinds of allergic reactions.
2. Materials & Methods
2.1. Protein Sequence RetrievalA total numbers of 130 Spike (S) glycoprotein & 41 Envelope (E) protein of MERS- CoV were retrieved from NCBI (https://www.ncbi.nlm.nih.gov/protein/) database in September 2016, which was actually collected from different parts of the world; such as Saudi Arabia, China, Thailand, United Kingdom, Qatar, Tunisia, and South Africa. The accession numbers of retrieved strains were listed in supplementary Table 1 & Table 2. All methods below were applied for S, E, modified S & E proteins; modified S & E protein were made by randomly changing some amino acids in theirs reference sequences, see Table 1 Envelope protein (E) with Table 2 Spike glycoprotein (S) gene bank accession numbers.
(https://insilico.ehu.es/PCR_virus/) In silico PCR amplification is a program that made amplification against sequenced viruses, by mimicking PCR amplification & primers confirmatory tools too, here it was used for the above viruses by using store gene bank sequence; it contains 1783 sequences from 1421 completely sequenced viruses (last update: 2010/05/31).
2.3. Determination of Conserved RegionsThe retrieved sequences, which collected from NCBI, were used as a platform to obtain the conserved regions by using multiple sequence alignment (MSA). Sequences aligned with the aid of ClustalW as implemented in the BioEdit program, version 7.0.9.0.
2.4. B-cell Epitope PredictionB cell epitope is characterized by being hydrophilic, accessible, flexible, antigenic propensity and in a beta turn region. Thus, the classical propensity scale methods and hidden Markov model programmed softwares from IEDB analysis resource (https://www.iedb.org/), were used for the following aspects:
2.4.1. Prediction of linear B-cell Epitopes
BepiPred from immune epitope database & analysis resource (https://toolsiedb.ofg/bcell/) was used as linear B-cell epitope prediction from the conserved region with a default threshold value of 0.350. BepiPred combines the predictions of a hidden Markov model and the propensity scale of Parker et al as it is described in Larsen et al (Immunome Research, 2006).
2.4.2. Prediction of Surface Accessibility
By Emini surface accessibility prediction tool of the immune epitope database (IEDB), the surface accessible epitopes were predicted from the conserved regions holding the default threshold value 1.000 or higher.
2.4.3. Prediction of Epitopes antigenicity Sites
THE kolaskar and tongaonker antigenicity method was used to determine the antigenic sites with a default threshold value of 1.045.
2.4.4. Prediction of Epitopes Hydrophilicity
Parker hydrophilicity prediction tool was used to determine the hydrophilicity of the conserved regions; the threshold default value was 1.286.
2.4.5. Prediction of Beta Turns Sites
Chou and Fasman beta turn prediction method was used with the default threshold 1.009 to determine the sites that contains beta turns.
2.4.6. Prediction of Flexibility
Karplus & Schulz flexibility prediction tool were used for prediction of chain flexibility in proteins (selection of peptide antigen) with default threshold value 0.992.
Thresholds of all tools were provided by IEDB and it is mainly calculated by the software as the average score of the tested protein for each corresponding tools.
2.5. T Cell Epitope PredictionScanning an antigen sequence for amino acid patterns indicative of:
2.5.1. MHC Class I Binding Predictions
Analysis of peptide binding to MHC class I molecules was assessed by the IEDB MHC I prediction tool https://tools.iedb.org/mhci/n, for MHC-I binding predication, several alleles were used including HLA-A, HLA-B, HLA-C and HLA-E that have been reported as frequent among all the world. MHC-I peptide complex presentation to T lymphocytes undergo several steps. The attachment of cleaved peptides to MHC molecules step was predicted. Consensus method which combines ANN, SMM and Scoring Matrices derived from Combinatorial Peptide Libraries (Comblib_Sidney2008) was used. 9mers epitope lengths were selected. All internationally conserved epitopes that bind to alleles at score equal or less than 1.0 percentile rank (low percentile rank = good binders) were selected for further analysis as in (Selecting thresholds (cut-offs) for MHC class I and II binding predictions, https://help.iedb.org/entries/23854373-Selecting-thresholds-cut-offs-for-MHC-class-I-and-II-binding-predictions).
Note: for S glycoprotein the sequence was divided to 10 parts due to software limitations; no more than 200 FASTA sequences interring [7, 8, 9, 10, 11].
2.5.2. MHC Class II Binding Predictions
Analysis of peptide binding to MHC class II molecules was assessed by the IEDB MHC II prediction tool https://tools.immuneepitope.org/mhcii/. For MHC-II binding predication, the reference set of alleles were used which include HLA-DQ, HLA-DP, and HLA-DR that are most frequent among the world. MHC class II groove has the ability to bind to peptides with different lengths. There are seven prediction methods for IEDB MHC II prediction tool; NetMHCIIpan was used in this study, the conserved epitopes that bind to alleles at score equal or less than 10 percentile rank were selected for further analysis as in (Selecting thresholds (cut-offs) for MHC class I and II binding predictions, https://help.iedb.org/entries/23854373-Selecting-thresholds-cut-offs-for-MHC-class-I-and-II-binding-predictions) [7, 11, 12, 13, 14].
2.5.3. Proteasomal Cleavage/TAP transport/MHC Class I Combined Predictor
This tool combines predictors of proteasomal processing, TAP transport, and MHC binding to produce an overall score for each peptide's intrinsic potential of being a T cell epitope was selected; in this study NetMHCpan was used with immuno proteasomal cleavage prediction; there are two types of proteasomes, the constitutively expressed 'house-keeping' type, and immuno proteasomes that are induced by IFN-γ secretion. Results can be displayed in proteasome score, TAP score, MHC score, processing score, total score and IC50 score. Explanation of predictions output:
Proteasome cleavage - The scores can be interpreted as logarithms of the total amount of cleavage site usage liberating the peptide C-terminus; it depends on a lot of other factors e.g. the amount of source protein degraded.
TAP transport - The TAP score estimates an effective –log (IC50) values for the binding to TAP of a peptide or its N-terminal prolonged precursors.
MHC binding - The MHC binding prediction is identical to the Class-I with output–log (IC50) values.
Processing - this score combines the proteasomal cleavage and TAP transport predictions. It predicts a quantity proportional to the amount of peptide present in the ER, where a peptide can bind to multiple MHC molecules. This allows predicting T-cell epitope candidates independent of MHC restriction.
Total - this score combines the proteasomal cleavage, TAP transport and MHC binding predictions. It predicts a quantity proportional to the amount of peptide presented by MHC molecules on the cell surface. High scores mean high efficiency.
2.5.4. Neural Network Based Prediction of Proteasomal Cleavage Sites (NetChop) and T Cell Epitopes (NetCTL and NetCTLpan)
NetChop that was used here, it’s a predictor of proteasomal processing based upon a neural network. NetCTL and NetCTLpan are predictors of T cell epitopes along a protein sequence. The positive predictions threshold, 0.5, 0.75 & 1 sequentially for all methods above are displayed in green, while the red colour for prediction below the threshold.
2.5.5. MHC-NP: Prediction of Peptides Naturally Processed by the MHC
MHC-NP employs data obtained from MHC elution experiments in order to assess the probability that a given peptide is naturally processed and binds to a given MHC molecule. This tool that used in this study was the winner of the 2nd Machine Learning Competition in Immunology; it composed of 3 groups of peptides: Binders, Non-binders and Eluted peptides that considered as naturally processed peptides, so greater probe score considered naturally processing peptide.
2.6. Epitope Analysis Tools2.6.1. Population Coverage Calculation
All potential MHC I and MHC II binders from Spike glycoprotein, E protein, S and E modified sequences were assessed for a population coverage against the whole world population especially Saudi Arabia with other reported MERS-CoV countries. Calculations achieved using the selected MHC-I and MHC-II interacted alleles by the IEDB population coverage calculation tool https://tools.iedb.org/tools/population/iedb_input, it compute; projected population coverage, average number of epitope hits / HLA combinations recognized by the population, and minimum number of epitope hits / HLA combinations recognized by 90% of the population (PC90).
2.7. Homology ModelingThe complete 3D structure of Spike glycoprotein, Envelope protein was obtained by phyre2, (https://www.sbg.bio.ic.ac.uk/phyre2) which uses advanced remote homology detection methods to build 3D models. UCSF Chimera (version 1.8) was used to visualize the 3D structure, which is currently available within the Chimera package and available from the chimera web site (https://www.cgl.ucsf.edu/cimera). Homology modeling was achieved for further verification of the service accessibility and hydrophilicity of B lymphocyte epitopes predicted, as well as visualization of all predicted T cell epitopes in the structural level.
In addition to the above methods, 3 others software was used to determine the effect that was induced in S&E reference sequences among the amino acid (SNP, single nucleotide polymorphism).
2.8. Confirmation of Amino Acid Change in Spike Glycoprotein (S) & Envelope Protein (E) Sequence1. PolyPhen-2
(Polymorphism Phenotyping v2) (https://genetics.bwh.harvard.edu/pph2/index.shtml) is an online bioinformatics program to automatically predict the consequence of an amino acid change on the structure and function of a protein was assessed here. Basically, this program searches for 3D protein structures, multiple alignments of homologous sequences and amino acid contact information in several protein structure databases, then calculates position-specific independent count scores (PSIC) for each of two variants, and then computes the PSIC scores difference between two variants; PolyPhen-2 scores were assigned as probably damaging (2.00 or more), possibly damaging (1.40–1.90), potentially damaging (1.0–1.50), benign (0.00–0.90). Basically PolyPhen-2 accepts input in form of SNPs or protein sequences (Mohamed et al, 2014).
2. I-Mutant Suite
I used I-Mutant version 3.0 (https://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi) to predict the protein stability changes upon single-site mutations. I-Mutant3.0 basically can evaluate the stability change of a single site mutation starting from the protein structure or from the protein sequences. This program was trained on some data set derived from ProTherm which is considered to be the most comprehensive database of experimental data on protein mutations (Mohamed et al, 2014).
3. Project Hope Mutation
(https://www.cmbi.ru.nl/hope/) Hope Version 1.1.0, HOPE is an easy-to-use web service that analyses the structural effects of a point mutation in a protein sequence.
4. SNPs & GO
(https://snps.biofold.org/snps-and-go//snps-and-go.html) were used to predict disease associated variations through using GO terms by collected information in a unique framework that derived from protein sequence, 3D structure, protein sequence profile, and protein function, beside Gene Ontology annotation to predict if a given variation can be classified disease-related or neutral. It calculate the result according to three methods used depend on SVM type and data such as:
PANTHER: Output of the PANTHER algorithm
PhD-SNP: SVM input is the sequence and profile at the mutated position
SNPs & GO: SVM input is all the input in PhD-SNP, PANTHER and GO terms features, by giving disease probability (if >0.5 mutation is predicted Disease).
2.9. Peptide Search ToolThe Peptide search tool was used to finds all UniProtKB sequences that exactly match a query peptide sequence (https://www.uniprot.org/peptidesearch/).This means we can easily synthesis the desired peptides in laboratory by cloning methods & so on to study peptide impact on immune system via injected laboratory animals with peptide sequence of any organisms.
2.10. AllerHunter(https://tiger.dbs.nus.edu.sg/AllerHunter/index.html) is a cross-reactive allergen prediction program built on a combination of Support Vector Machine (SVM) and pairwise sequence similarity. Results of prediction of query sequence(s) can be achieved by using AllerHunter and FAO/WHO evaluation scheme, in AllerHunter sequence can be considered as a cross-reactive allergen if it has a probability is >=0.06 while in the guideline of the FAO/WHO they stated that a sequence is potentially allergenic if it either has an identity of at least 6 contiguous amino acids OR >35 percent sequence identity over a window of 80 amino acids when compared to known allergens.
2.11. AlgPred: Prediction of Allergenic Proteins and Mapping of IgE Epitopes(https://www.imtech.res.in/raghava/algpred/index.html) AlgPred used to predict allergenic protein & mapping of IgE epitopes by;
1- It allows prediction of allergens based on similarity of known epitope with any region of protein.
2- The mapping of IgE epitope(s) feature of server allows user to locate the position of epitope in their protein.
3- Server search MEME/MAST allergen motifs using MAST and assign a protein allergen if it have any motif.
4- Allows predicting allergens based on SVM modules using amino acid or dipeptide composition.
5- It facilitates BLAST search against 2890 allergen-representative peptides (ARPs) obtained from Bjorklund et al 2005 and assign a protein allergen if it have a BLAST hit..
6- Hybrid option of server allows predicting allergen using combined approach (SVMc + IgE epitope + ARPs BLAST + MAST).
2.12. VaxiJen v2.0(https://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen_help.html) VaxiJen is the first server for alignment-independent prediction of protective antigens. It was developed to allow antigen classification solely based on the physicochemical properties of proteins without recourse to sequence alignment.
3. Results
3.1. Prediction of B-cell EpitopesSpike glycoprotein, E protein, modified S & E protein were subjected to BepiPred linear epitope prediction, Emini surface accessibility, Kolaskar and Tongaonkar antigenicity, Parker hydrophobicity, Chou and Fasman beta turn prediction methods, Karplus & Schulz flexibility in IEDB, as the results in chart.1-24.
3.1.1. BepiPred Linear Epitope Prediction Method
The average binders score of Spike glycoprotein to B cell was 0.35, all values equal or greater than the default threshold 0.35 were predicted to be potential B cell binders.
3.1.2. Emini Surface Accessibility Prediction
The average surface accessibility areas of the protein was scored as 1.000, all values equal or greater than the default threshold 1.0 were regarded potentially in the surface. The total numbers of positive S glycoprotein peptide represents 481 peptide out of 1349 while in E protein represents 23 out of 77 and in S and E modified sequence represents 485 out 485 and 17out of 77 peptides sequentially.
3.1.3. Kolaskar and Tongaonker Antigenicity
The default threshold of antigenicity of the protein was 1.045; all values greater than 1.045 were considered as potential antigenic determinants. The positive result number of selected S glycoprotein peptide represents 655 out of 1348 while in E protein represent 55 out of 76 and in S& E modified sequence represents 668 out of 668 and 47out of 76 peptides sequentially.
3.1.4. Parker hydrophilicity prediction
The average hydrophilicity score of the protein was 1.286; all values equal or greater than the default threshold 1.286 were potentially hydrophilic. The positive results number of S glycoprotein peptide represents 693 out of 1348 while in E protein represent 18 out of 76 and in S & E modified sequence represents 690 out of 695 and 20 out of 76 peptides sequentially.
3.1.5. Chou and Fasman Beta Turn Prediction
To determine the site that contains beta turns the default threshold was 1.009; all values equal or greater than the default threshold were considered beta turn sites. The positive results number of selected peptide represent 668 out of 1348 in S glycoprotein while it represents 19 out of 76 in E protein and 673 out of 673 with 21out of 76 in both S and E modified sequence sequentially.
3.1.6. Karplus & Schulz Flexibility Prediction
The default threshold value 0.992 determined chain flexibility in proteins, so all values equal or greater than the default threshold were considered as chain flexibility of protein. The positive results of selected peptide represents 679 out of 1347 in S glycoprotein and it represents 24 out of 24 in E protein beside represented 680 out of 681 and 24 out of 75 in S and E modified sequences sequentially.
The most common B cell epitope for E protein is YVKFQDS in a position 69 while for E protein modified sequence they are VYVPQQD, YVPQQDS, PPLPED / PPLPEDV in positions 68, 69, 77 sequentially.
The most common B cell epitopes for both S & modified S are: DVGPDSV, PDSVKSA, DSVKSAC, PRPIDVS, HTPATDC, AKPSGSV, KPSGSVV, SGTPPQV, GTPPQVY, TPPQVYN, QLSPLEG, YGPLQTP, PRSVRSV, RSVRSVP, SVKSSQS, VKSSQSS, SQSSPII, SLNTKYV in the following positions 23, 26, 27, 48, 211, 371, 372, 393, 394, 395, 547, 707, 750, 751, 855, 856, 859 (or 857 in modified S), 1202 sequentially; but QVDQLNS and VDQLNSS in a positions 772 & 773 ordinary only found in S glycoprotein while LTPTSSY, TPTSSYV, PTSSYVD, TSSYVDV, DHGDYYV, YSQDVKQ, ANQYSPC, NQYSPCV and YYRKQLS in a positions 15, 16, 17, 18, 83, 108, 523, 524, 543 sequentially only found in S glycoprotein modified sequence.
























Spike glycoprotein, E protein, with S and E modified sequence were subjected to Consensus method for MHC-I binding, NetMHCIIpan for MHC-II binding, NetMHCpan for Proteasomal cleavage/TAP transport/MHC class I combined predictor, NetChop and NetCTL for Neural network based prediction of proteasomal cleavage sites (NetChop) and T cell epitopes (NetCTL and NetCTLpan) with MHC-NP for Prediction of peptides that’s naturally processed by the MHC in IEDB software program.
3.2.1. MHC Class I Binding Predictions
Analysis of peptide sequence that’s binding to MHC class I molecules by Consensus method was assessed by the conserved epitopes that bind to alleles at score equal or less than 1.0 percentile. The positive result numbers of selected peptide represents 602 out of 53800 in S glycoprotein and 63 out of 3626 in E protein while in S and E modified sequence represents 612 out of 58457 and 41 out of 3234 sequentially.
There are seven alleles were not found in E protein modified sequence, including HLA-A*03:01, HLA-A*11:01, HLA-A*31:01, HLA-A*68:01, HLA-B*14:02, HLA-B*40:01, HLA-B*40:02, while in E protein four alleles were not found, they are: HLA-B*48:01, HLA-B*58:02, HLA-C*04:01, HLA-E*01:01, the ruminant of alleles are common between both of them, among them three peptide sequences are common such as CMTGFNTLLn, MTGFNTLLVn, QCMTGFNTLn, while HLCVQCMTG, KPPLPEDVW, LLVCTAFLT, LLVQPALSL, LTATHLCVQ, LVCTAFLTA, PALSLYMTG, PNFFDFTVVn, SLYMTGRSV, VCTAFLTAT, VQERIGWFI, VQPALSLYM, VVCDITLLV, WFIPNFFDFn only found in E modified sequence.
HLA-A*02:01 allele showed a higher frequency numbers six, followed by HLA-A*23:01, HLA-A*29:02, HLA-A*68:02, HLA-B*46:01 that had a four frequency numbers, and same for the peptide sequences FIFTVVCAI, ITLLVCMAF, IVNFFIFTVn, LVQPALYLY in E protein while in modified E I found HLA-C*03:03 represents a very higher frequency numbers forty-three, but HLA-A*02:01, HLA-A*02:06, HLA-A*29:02, HLA-B*38:01 had the same frequency numbers three.
For the peptide sequences I found FIFTVVCAI had a higher frequency numbers five, followed by ITLLVCMAF, IVNFFIFTVn, and LVQPALYLY in E protein; reverse E protein modified sequence, LVQPALSLY had a higher frequency numbers five then followed by CMTGFNTLLn, FLTATHLCV, FVQERIGWF, ITLLVCTAF, LYMTGRSVY, WFIPNFFDFn, YMTGRSVYV which had a frequency numbers four except QCMTGFNTLn that had a three frequency numbers.
N.B: nindicate presences of Asparagine (N) in peptide sequences, that’s hiding epitope from recognition by immune system so we should deals with the common epitope with the caution; they are 11 peptide sequence numbers with Asparagine in E and 13 in modified E while they are 8 in S and 46 in modified S sequence.
HLA-A*30:02 allele was not found in S glycoprotein modified sequence, while HLA-B*38:01, HLA-B*39:01, HLA-B*40:01, HLA-B*40:02, HLA-B*44:02, HLA-B*44:03, HLA-B*46:01, HLA-B*48:01, HLA-B*51:01, HLA-B*53:01, they were not found in S sequence but they were found in S modified sequence; these means 15 peptide sequences were absent in S sequence (AGYKVLPPL, APQVTYQNIn, CKLPLGQSL, CVFFILCCV, DVKQFDNGFn, DYYVYSAGH, FKLSIPTNFn, FLLTPTSSY, GEMRLASIA, GNYTYYHKWn, GPASARDLI, GTDTNSVCIn, HKWPWYIWL, HSKFLLMFL, IAPVNGYFIn) but presented in modified S sequence, beside this it also lakes a 34 peptide sequences like: AGPISQFNYn, CMGKLKCNRn, DLSQLHCSY, DVKQFANGFn, FATYHTPAT, FLLTPTESY, FQFATLPVY, FVYDAYQNLn, GTNCMGKLKn, GVRQQRFVY, HSVFLLMFL, ICAQYVAGY,…ect, the others peptide sequences were not shown here.
In S glycoprotein HLA-A*29:02 allele showed a higher frequency numbers (41) then followed by HLA-A*30:02 (37), HLA-A*01:01 (31), HLA-B*15:01 (29), HLA-C*14:02 (27), HLA-A*25:01 (25), HLA-A*23:01 (24), HLA-B*58:01 (23), HLA-C*06:02 (22), modified S glycoprotein sequence partially shared the same alleles with higher frequency numbers like in S glycoprotein which they are; HLA-A*29:02 allele that represented the most higher frequency numbers (33), followed by HLA-C*14:02 (27), HLA-A*01:01 (25), HLA-B*46:01 (22)/ HLA-A*23:01, HLA-B*58:01, HLA-C*06:02 (21)/ HLA-B*15:01 (20). In S glycoprotein the following peptide sequences had a higher frequency numbers such as 10 in FSFGVTQEY and ITYQGLFPY peptides, 8 in WSYTGSSFY, 7 in KAWAAFYVY and 6 in FVYDAYQNLn, ITITYQGLF, QTAQGVHLF while its represented 5 in FQFATLPVY, NSYTSFATYn, SLILDYFSY, STVWEDGDY, VSVPVSVIY, YTYYNKWPWn, but in modified S glycoprotein the frequency were different, like 10 in FSFGVTQEY peptide, 4 in FLLTPTSSY, FSSRYVDLY, FVANYSQDVn, FYVYKLQPL and IAFNHPIQVn while its 3 in ASIAFNHPIn, DEILEWFGI, DYFSYPLSM, EAAYTSSLL, FCSKINQALn, FFNHTLVLLn, FQDELDEFF, FSDGKMGRF, FSNPTCLILn, GEMRLASIA, GRFFNHTLVn, HISSTMSQY and HKWPWYIWL peptides.
N.B: n indicate presences of Asparagine (N) in peptide sequences, that’s hiding epitope from recognition by immune system.
3.2.2. MHC Class II Binding Predictions
Analysis of peptide binding to MHC class II molecules was assessed by the Conserved epitopes that bind to alleles at scores equal or less than 10 percentile rank; the positive results numbers of selected epitopes showed 212 out of 4819 epitopes in S glycoprotein, 685 out of 4148 in E protein and 6896 out of 75206 with 685 out of 4148 in both S and E modified proteins sequential.
The following alleles are more commons between S glycoprotein, E protein, S &E modified sequences, and they are: HLA-DPA1*01:03/DPB1*02:01, HLA-DPA1*02:01/DPB1*01:01, HLA-DRB1*01:01, HLA-DRB1*01:02, HLA-DRB1*04:04, HLA-DRB1*04:05, HLA-DRB1*04:08, HLA-DRB1*04:10, HLA-DRB1*04:23, HLA-DRB1*07:01, HLA-DRB1*07:03, HLA-DRB1*08:06, HLA-DRB1*11:04, HLA-DRB1*11:06, HLA-DRB1*12:01, HLA-DRB1*13:04, HLA-DRB1*13:11, HLA-DRB1*13:21, HLA-DRB4*01:01, but in S & modified S glycoprotein both of them at the same time they contains other 42 different alleles not shown here. In E & modified E protein, HLA-DRB1*01:01 had a higher frequency numbers of alleles which represented 20, followed by 17 in HLA-DRB1*01:02, 11 in HLA-DRB1*12:01, 10 in HLA-DRB1*11:04, HLA-DRB1*11:06, HLA-DRB1*13:11 and 9 in HLA-DRB1*07:01, HLA-DRB1*07:03 and HLA-DRB1*13:21, while in S & modified S glycoprotein those alleles below had a higher frequency numbers, which represented (200/199) in HLA-DRB1*04:08/ (199/201) in HLA-DRB1*04:01, HLA-DRB1*04:21, HLA-DRB1*04:26/(194/190) in HLA-DRB1*09:01/ (192/189) in HLA-DRB1*04:05/(167/167) in HLA-DRB1*07:01, HLA-DRB1*07:03 /(164/167) in HLA-DRB1*15:02, (160/159) in HLA-DRB1*13:02/(159/159) in HLA-DRB1*11:14, HLA-DRB1*11:20, HLA-DRB1*13:23 and (152/158) in HLA-DRB3*01:01.
E & modified E protein had the same peptide sequences with same frequency numbers but the higher frequency numbers only showed in peptides below; it represented 15 with GFNTLLVQPALSLYMn, 14 with TGFNTLLVQPALSLYn, 13 with FNTLLVQPALSLYMT, 12 with MTGFNTLLVQPALSLn, 11 with NTLLVQPALSLYMTGn and 10 with those ALSLYMTGRSVYVPQ, LSLYMTGRSVYVPQQ, PALSLYMTGRSVYVP, QPALSLYMTGRSVYV peptides.
N.B:-
1- The following allele’s bellows are not available for S glycoprotein, E, S & E modified sequence, and they are: DPA1*01-DPB1* 04:01, DRB1*03:09, DRB1*08:17, DRB1*13:28.
2- The same peptide sequence shared more than 1 allele gene or the same allele have a different peptide sequence.
3- Variation in frequency numbers among both alleles & peptide sequences that have showed when comparing reference sequence of S & E protein with the modified sequence of both of them.
4- n that’s present in peptide sequences above indicate presence of Arginine in the sequence.
3.2.3. Proteasomal Cleavage/TAP Transport/MHC Class I Combined Predictor
In NetMHCpan high scores means high efficiency due to prediction of a quantity proportional to the amount of peptide presented by MHC molecules on the cell surface; total score higher or equal to 0 were selected for S & modified S glycoprotein while in E protein total score equal or higher than 0.3 was selected, but in modified E protein total score equal or higher than -2.82 was selected, see Table 3 &Table 4.
3.2.4. Neural Network Based Prediction of Proteasomal Cleavage Sites (NetChop) and T-Cell Epitopes (NetCTL and NetCTLpan)
The positive predictions thresholds are 0.5 and 0.75 (green colour) for NetChop and NetCTL sequentially considered as proteasomal cleavage sites for T cell epitopes, see charts: 25-38 with Table 5.
NetChop prediction score equal or greater than 0.5 in S glycoprotein represented a positive result; more than 300 peptides out of 1353 showed positive results, while in Modified S glycoprotein, 5 out of 66 showed positive results, in E protein 28 out of 82 were positives and 28 out of 82 in modified E protein were positives.
Both E & modified E protein showed 28 amino acid that’s crossed the threshold; 0.5 with same residue position like: F→ 33; L → 58, 50, 39, 51, 28, 56, 2; Q → 70; R → 63; Y → 59 and 66; V → 67, 65, 41, 21, 22, 52, 29; except: V → 82 in E protein while it’s at position 10 in modified E protein, L → 76 in E protein while at position 34 and 6 in modified E protein, F→ 69 in E protein while it’s at positions 17 and 19 in modified E protein, W→ 81 in E while it’s at position 11 in modified E protein, R → 38 in E, I → 18 in E, K → 68 and 73 in E while A → 32 in modified E protein with M → 60,Y → 57 in E protein.
N.B:-
1- Peptide sequences of both E and modified E protein were difference even if had a similar residue position.
2- NetCTL was used for E & Modified E protein just due to large amounts of Data beside time consuming when it used with S glycoprotein.
3- Modified E protein NetCTL charts were not shown here.














3.2.5. MHC-NP: Prediction of Peptides Naturally Processed by the MHC
The greater probe score was considered as naturally processing peptide; probe score greater than 0 were considered as naturally processing peptides.
The total positive epitopes numbers of naturally processing peptides represented 10189 out of 10760 in S glycoprotein and 10187 out of 10760 in modified S glycoprotein while represents 568 out of 592 in E and 566 out of 592 in modified E protein).
E protein showed alleles frequencies; H-2-Db (74), H-2-Kb (74), HLA-A*02:01 (68), HLA-B*07:02 (66), HLA-B*35:01 (74), HLA-B*44:03 (74), HLA-B*53:01 (73), HLA-B*57:01 (62) while in modified E they are H-2-Db (28), H-2-Kb (16), HLA-A*02:01 (5), HLA-B*07:02 (2), HLA-B*35:01 (6), HLA-B*44:03 (28), HLA-B*53:01 (60), HLA-B*57:01 (4).
N.B: modified E protein showed less allele frequency when compared with E protein in addition to some epitope differences even if at the same positions.
3.3. Epitope Analysis Tools3.3.1. Population Coverage Calculation
MHC-I and MHC-II interacted alleles by the IEDB population coverage calculation tool was computed by the average number of epitope hits / HLA combinations recognized by the population, and minimum number of epitope hits / HLA combinations recognized by 90% of the population (PC90), see tables below.
Those below represented a selected E protein epitopes for population coverage calculation:
PFVQER, VQERIG, QERIGL, FLTATR, LYLYNT, YLYNTG, LYNTGR, YNTGRS, NTGRSV, TGRSVY, RSVYVK, YVKFQD, VKFQDS, KFQDSK, FQDSKP, QDSKPP, DSKPPL, SKPPLP, KPPLPP, PPLPPD, PLPPDE, LPPDEW, PPDEWV, MLPFVQE, LPFVQER, PFVQERI, VQERIGL, RIGLFIV, IGLFIVN, GLFIVNF, LFIVNFF, FIVNFFI, IVNFFIF, VNFFIFT.
There are differences between MHC-I and MHC-II population coverage percentage.
There are similarities between MHC-I between both E and modified E protein, but still there are differences between them at MHC-II.
Those below represented a selected modified E protein epitopes for population coverage calculation:
RSVYVP, LYMTGR, VYVPQQ, PLPEDV, QERIGW, TGRSVY, YMTGRS, QFVQER, VPQQDS, SKPPLP, PPLPED, DSKPPL, YVPQQD, KPPLPE, QDSKPP, PQQDSK, QQDSKP, PLPEDVW, QFVQERI, AFLTATH, MLQFVQE, ALSLYMT, LQFVQER, VQCMTGF, YVPQQDS, GFNTLLV, PPLPEDV, FLTATHL, TGRSVYV, PALSLYM, NTLLVQP, FNTLLVQ, LPEDVWV, CTAFLTA.
The percentage of a coverage population were similar among both S glycoprotein reference sequence and modified S glycoprotein, its represented 95.60% of the world by MHC-I, 118 countries showed a higher percentage especially Chile Amerindian (100%), 69 other countries showed (0%) while in East Asia represented (94.80%), South Korea & South Oriental Korea ( 92.84%), China (88.77%), Iran & Iran Persian (91.53%) but Iran Kurd (0.00%), Jordan & Jordan Arab (76.80%),Oman & Oman Arab (95.82%), Saudi Arabia & Saudi Arabia Arab (96.38%), United Arab Emirates & United Arab Emirates Arab (0.00%), Sudan (86.43%), Sudan Arab (49.41%), Sudan Black (0.00%) & Sudan Mixed (87.06%), please see Table 6.
According to the percentage of a coverage population that was similar between S glycoprotein reference sequence & modified S glycoprotein, the world MHC-II represent 81.81%, 64 countries showed a higher percentage especially Norway & Norway Caucasoid which represented (94.71%), 59 other countries showed (0%) while in East Asia represents (94.80%), South Korea & South Oriental Korea (85.32%), China (59.99%), Iran (64.22%), Iran Persian (55.78%), Iran Kurd (65.72%), Jordan & Jordan Arab (52.88%), Oman & Oman Arab (0.00%), Saudi Arabia & Saudi Arabia Arab (80.14%), United Arab Emirates & United Arab Emirates Arab (32.92%), Sudan (60.56%), Sudan Arab (0.00%), Sudan Black (0.00%) & Sudan Mixed (60.56%), as in Table 7.
According to the percentage of MHC-I E protein coverage, the world MHC-I represent 95.60%, 116 countries showed a higher percentage especially Chile Amerindian it represented (100%), 23 other countries showed more than 4% but less than 50% while in East Asia it represents (94.80%), South Korea & South Oriental Korea (92.84%), China (88.77%), Iran & Iran Persian (91.53%%), Jordan & Jordan Arab (76.80%), Oman & Oman Arab (95.82%), Saudi Arabia & Saudi Arabia Arab (96.38%), Sudan (86.43%), Sudan Arab (49.41%), Sudan Black (0.00%) & Sudan Mixed (87.06%), see Table 8. Iran Kurd, United Arab Emirates & United Arab Emirates Arab were not mentioned and showed results in this tool.
According to the percentage of MHC-I modified E protein coverage population that’s represented 95.60% of the world population, 112 countries showed a higher percentile rate especially Chile Amerindian which represents (100.00%), 96 other countries showed (0%) while in East Asia represents (94.80%), South Korea & South Oriental Korea (92.84%), China (88.77%), Iran (91.53%), Iran Persian (91.53%), Iran Kurd (0.00%), Jordan & Jordan Arab (76.80%), Oman & Oman Arab (95.82%), Saudi Arabia & Saudi Arabia Arab (96.38%), United Arab Emirates & United Arab Emirates Arab (0.0%), Sudan (60.56%), Sudan Arab (0.00%), Sudan Black (0.00%) & Sudan Mixed (60.56%), see Table 9.
According to the percentile rates of MHC-II E protein coverage population that’s represented 81.81% of the world population, 63 countries showed a higher percentage especially Norway & Norway Caucasoid (94.71%), 45 other countries showed from 0% - less than 50% while in East Asia represents (94.80%), South Korea & South Oriental Korea (85.32%), China (59.99%), Iran (64.22%), Iran Persian (65.72%), Iran Kurd (55.78%), Saudi Arabia & Saudi Arabia Arab (80.14%), United Arab Emirates & United Arab Emirates Arab (32.92%), Sudan & Sudan Mixed (60.56%), see Table 10. Oman, Jordan, Sudan black & Arab were not mentioned and showed results in this tool.
According to the percentage of MHC-II modified E protein coverage population that’s represented 81.81% of the world population, 62 countries showed a higher percentage especially Norway & Norway Caucasoid (94.71%), 59 other countries showed 0% while in East Asia represents (94.80%), South Korea & South Oriental Korea (85.32%), China (59.99%), Iran (64.22%), Iran Persian (65.72%), Iran Kurd (55.78%), Jordan & Jordan Arab (52.88%), Oman & Oman Arab (0.00%), Saudi Arabia & Saudi Arabia Arab (80.14%), United Arab Emirates & United Arab Emirates Arab (32.92%), Sudan & Sudan Mixed (60.56%), Sudan Arab & Sudan Black (0.00%), see Table 11.
The results of Homology Modeling were not showing here because they are not necessary.
3.5. Confirmation of Amino Acid Change in Spike Glycoprotein (S) & Envelope Protein (E) SequenceThe results of confirmatory amino acid change were not shown here because they are not necessary.
3.6. Peptide Search ToolThe Results of Peptide search tool showed presences of selected peptide sequence in another’s organisms such as Leishmania donovani, Drosophila sechellia (Fruit fly), Leishmania infantum, Trypanosoma cruzi Dm28c, Strigamia maritime, Nocardioides dokdonensis,…., beside some species of Mycobacteria, Salmonella, Streptococcus, …, these may be means presences of these peptides in those organisms had a relationship with respiratory disease but stills need to go deeper to confirm this suggestion, other things we can easily synthesis the desired peptides in laboratory by using one of this organisms (cloning techniques) because it is easy and no risk from acquired a very dangers infections beside determination of the peptide sequences impact on immune system via injected laboratory animals with those selected peptide sequences from any organisms.
3.7. AllerHunter: Cross-reactive Allergen Prediction ProgramAny sequence can be considered as a cross-reactive allergen if its probability is >=0.06. The results considered that Envelope (E) protein, Spike (S) glycoprotein & modified S glycoprotein as potential non-allergen the with score of 0.01, 0.0, 0.0 sequentially while modified E protein sequence was too short for prediction (AllerHunter predicted the query sequence as a potential allergen with score of 0.07). According to the FAO/WHO E & modified E protein sequence are classified as a non-allergen due to they do not meet the criteria set by the FAO/WHO evaluation scheme for cross-reactive allergen prediction but in S & modified S glycoprotein they are classified as a potential allergen based the FAO/WHO evaluation scheme due to query sequence matches at least one sequence in the AllerHunter data set with at least 35 percent identity over 80 amino acids.
3.8. AlgPred: Prediction of Allergenic Proteins and Mapping of IgE EpitopesAlgPred showed non allergen for all four sequences (S, E, modified S & E proteins) as follow:-
1- Prediction by mapping of IgE epitope: The protein sequence does not contain experimentally proven IgE epitope.
2- MAST RESULT: No Hits found; NON ALLERGEN.
3- BLAST Results of ARPS: No Hits found; NON ALLERGEN.
4- Prediction by Hybrid Approach: NON ALLERGEN/ ALLERGEN
There were slightly differences between the four sequences in SVM prediction methods according to amino acid composition/ dipeptide composition as in tables bellow;
Table 13. Illustrates SVM predictions methods based on dipeptide composition for the four protein sequences
VaxJen servers showed three proteins sequences out of two, considered as probable antigens, as illustrated below;
S glycoprotein: Threshold for this model: 0.4; Overall Antigen Prediction = 0.4827 (Probable ANTIGEN).
Modified S glycoprotein: Threshold for this model: 0.4; Overall Antigen Prediction = 0.4907 (Probable ANTIGEN).
E protein: Threshold for this model: 0.4; Overall Antigen Prediction = 0.3811 (Probable NON-ANTIGEN).
Modified E protein: Threshold for this model: 0.4; Overall Antigen Prediction = 0.4417 (Probable ANTIGEN).
4. Discussion
Today’s there are so many different ways to develop MERS-CoV vaccine, some of them partially succeed but the others failed while the remaining nor succeed neither failed because it depends on software program for different reasons & still need to go under vaccine protocols processing, in those studies that consist with S1 protein subunit especially RBD (the most mutable region, that containing mutation sites which define antibody escape variants) was considered the basis for several MERS-CoV vaccine candidates in many studies such as using RBD with aluminum salt or oil-in-water adjuvants; can elicited neutralizing antibodies of high potency across multiple viral strains by Modjarrad K (2016); Wang L et al, 2015 said that the full-length S DNA and a truncated S1 subunit glycoprotein, can elicit a higher titer of neutralizing antibodies, this kind of immunization protected non-human primates (NHPs) from severe lung disease after intra-tracheal challenge with MERS-CoV injection; in another study that was done in Iran by POORINMOHAMMAD N et al (2014) [NetCTL 1.2 (Larsen et al., 2007), EpiJen (Doytchinova et al, 2006), and NHLApred (Bhasin and Raghava, 2007) they were selected computational prediction tools with PEPstr server for modeling (Kaur et al, 2007)] to identify cytotoxic T-lymphocyte epitopes presented by the human leukocyte antigen (HLA)-A*0201, as this is the most frequent HLA class I allele among Middle Eastern populations with this selected RBD for their study they showed LLSGTPPQV, ILDYFSYPL ILATVPHNL, NLTTITKPL, LQMGFGITV, FSNPTCLIL as selected epitopes but LLSGTPPQV & FSNPTCLIL were considered as real epitope due to; peptides with binding orientations closer to the native structure and lower binding free energy scores are ranked higher in having the potential to be real epitopes reverse another study were done by Shi J et al, 215 by using the Immune Epitope Database, that said: the nucleocapsid (N) protein of MERS-CoV might be a better protective immunogen with high conservancy and potential eliciting both neutralizing antibodies and T-cell responses when compared with spike (S) protein; in addition 71 peptides were identified as helper T-cell epitopes, 34 peptides were identified as CTL epitopes; just top 10 helper T-cell epitopes and CTL epitopes based on maximum HLA binding alleles, can elicit protective cellular immune responses against MERS-CoV were considered as MERS vaccine candidates & they are covering 15 geographic regions (Shi J et al, 215).
In this study that consists of two parts reference & modified sequence of both S glycoprotein & E protein I found that, the most common B-cell epitope that passed all B-cell prediction methods [IEDB prediction tool] for E protein is YVKFQDS in position 69 and for modified E they are VYVPQQD, YVPQQDS, PPLPED / PPLPEDV epitopes at positions 68, 69 and 77 sequential; while for S & modified S they are: DVGPDSV, PDSVKSA, DSVKSAC, PRPIDVS, HTPATDC, AKPSGSV, KPSGSVV, SGTPPQV, GTPPQVY, TPPQVYN, QLSPLEG, YGPLQTP, PRSVRSV, RSVRSVP, SVKSSQS, VKSSQSS, SQSSPII, SLNTKYV at positions 23, 26, 27, 48, 211, 371, 372, 393, 394, 395, 547, 707, 750, 751, 856, 859 (857 in modified S glycoprotein) and 1202 sequential, but QVDQLNS, VDQLNSS epitopes at positions 772 and 773) only found in S glycoprotein while LTPTSSY, TPTSSYV, PTSSYVD, TSSYVDV, DHGDYYV, YSQDVKQ, ANQYSPC, NQYSPCV, YYRKQLS epitopes at positions 15, 16, 17, 18, 83, 108, 523, 524 and 543 they are only found in modified S glycoprotein, according to my study I found that the results of S & modified S glycoprotein they are partially agree with the study that was done in Africa city of Technology- Khartoum, Sudan by Badawi M. M et al, 2016 in those epitopes GTPPQVY in position 391-397 & LTPRSVRSVP in position 745- 754, may be do you to different numbers of selected MERS-CoV protein sequence.
Prediction of cytotoxic T-lymphocyte epitopes and their interaction with MHC Class I, the results showed ILDYFSYPL was similar according my study, Badwai M. M et al, 2016 & POORINMOHAMMAD N & MOHABATKAR H, 2014 studies; partially similarity with Iranian study (POORINMOHAMMAD N & MOHABATKAR H, 2014) in LLSGTPPQV, ILATVPHNL, LQMGFGITV, FSNPTCLIL epitopes were noticed except NLTTITKPL epitope that was absent from my study in S & modified S sequence; FSNPTCLIL represents the only epitope that found in my study in S & modified S sequence; FSFGVTQEY have a high affinity to bind to many alleles & these finding agree with Badawi M.M et al,216 in addition to ITYQGLFPY in my study through S glycoprotein sequence but still there are differences in the numbers of selected epitopes that reacted with MHC-I which were higher than that in Badawi M.M et al, 216 while in E protein FIFTVVCAI epitope have a higher alleles affinity followed by ITLLVCMAF, IVNFFIFTV, LVQPALYLY reverse modified E protein, LVQPALSLY epitope shown high affinity then followed by LYMTGRSVY, WFIPNFFDF, YMTGRSVYV, ITLLVCTAF, FVQERIGWF, FLTATHLCV & CMTGFNTLL, the last epitope which are common between E & modified E protein sequences.
Prediction of T-helper cell epitopes and their interactions with MHC Class II showed FNLTLLEPVSISTGS epitope that was considered as the most suitable epitope with a high affinity to 26 alleles in Badawi M.M et al, 216, this epitope was actually found in S & modified S sequence of my study but the difference is that, it cannot considered that the most suitable epitope with a high binding affinity to different alleles like in in Badawi M.M et al, 216 study.
There is no research results related to E protein, modified E & S glycoprotein epitopes vaccine instead of partial similarity that I was founded between S & modified S glycoprotein results in this study.
There is no previous study illustrate S glycoprotein & E protein allergic reactions except the study that were done by Shi J et al, 2015 for N protein, but in this study S &E protein showed no allergic reaction according to AllerHunter services. Furthermore Shi J et al, 2015 said that, for N protein, the analysis of the surface accessibility of the predicted peptides showed that the maximum surface probability value was 6.971 at amino acid position from 363 to 368 (363KKEKKQ368) but the minimum value of surface probability was 0.074 for 205GIGAVG210 peptides while in the analysis of the flexibility of the predicted peptides they showed that the maximum flexibility value was 1.160 at amino acid position from 170 to 176 (167GNSQSSS173) with the minimum value 0.903 for peptides 97RWYFYYT103; in MHC-II the epitope 329LRYSGAIKL337 interacting with 357 HLA-DR alleles was considered the epitope that possessing the maximum number of binding HLA-DR alleles while 230VKQSQPKVI238 interacting with 94 HLA-DR alleles is the epitope that possessing the minimum number of binding HLA-DR alleles & also the same was occurred with MHC-I; KQLAPRWYF100 had the highest number of binding HLA-A alleles in MHC-I then followed by 343NYNKWLELL351, 72AQNAGYWRR80 and 387RVQGSITQR395 (see Shi J et al, 2015 paper for coverage population), in addition to the above, the study that were done by Sharmin R and Abul Bashar Khademul Islam M M AB, 2014 showed that WDYPKCDRA was considered as a highly conserved epitope in the RNA directed RNA polymerase of human coronaviruses after applying of multiple sequence alignment (MSA) approach for spike (S), membrane (M), enveloped (E), nucleocapsid (N) protein and replicase polyprotein 1ab to identify which one is highly conserve in all coronaviruses strains, followed by using various in silico tools to predict consensus immunogenic and conserved peptide.
Furthermore information that were not shown here, are that, I was used the software below to confirm MHC-II results & their results were partially agree with IEDB MHC-I results & I do not know why? EpiDOCK: Molecular docking - based tool for MHC class II binding prediction (https://epidock.ddg-pharmfac.net/), EpiTOP1.0 (https://www.pharmfac.net/EpiTOP/index.php), other things that I do not agree with Shi J et al, 2015 when he did alignments for S, E, M….., with all human coronavirus & said he just found the most common peptide was N protein alone, because when I trying to made alignment for S, M, ORFA1,.., I found some alignments between those proteins and different coronavirus strains and this may be means presence of some common peptide but it still needs more studies.
5. Conclusions
As I mention before software vaccine & drug design becomes very important in first & third world countries to avoid wasting resources, times & efforts, for MERS-CoV vaccine it is important to design effective vaccine that cannot protected against MERS-CoV but also the emergence of new strain beside the others human coronavirus especially when MERS-CoV vaccines they are not passed all vaccine design protocols.
I n this study I found the following points: Emergence of a new strains may had a minor change in peptide sequence vaccine especially when the selected viruses parts nor longer neither smaller in their length.
In B-cell prediction; mutations can leads to increased numbers of selected epitopes with very few sequence changes noticed, in addition to a large numbers of shared epitopes between reference & modified sequence; this means mutated sequence has ability to elicit the same immune response (IR) (response to virus by the same antibodies as in first infections).
Mutations of the virus sequence can changes the frequency of alleles & peptides numbers eithers through increased or decreased these numbers, beside presences or absences of some new/old alleles or peptides; same alleles had a different peptide sequences & vice versa.
For MHC-II there were not changed in E & modified E protein alleles & their frequencies & also in peptide sequences & their frequencies were noticed, these may be due to short E protein sequence, while for S & modified S glycoprotein there are minor difference in some peptide frequency numbers either by adding/ lowering one or two numbers just & same for alleles.
There are some allele’s similarity between E, S & modified E& S proteins in MHC-II, beside presence of a tiny difference in S & modified S peptide sequences in MHC-II due to the modification that I was introduced before in S reference sequence.
Absence of very few numbers of peptide sequences from S reference sequence in modified S sequence leads to the presence of a new peptide sequences.
In MHC-I a lots of selected peptide sequences that represented in S glycoprotein reference sequence they are missing from the modified one reverse E protein reference sequence due to presence of additional epitopes in E protein modified sequence.
Presence of arginine in some selected peptide sequences vaccine makes it ineffective, so we need to solve this problem either by replace it with other amino acid from the same group or by finding another ways that makes those epitopes visible for immune system (IS).
Presence of mutated sequence can effect on the coverage population in MHC-II by presence/absence of some countries, with the percentage changes, reverse MHC-I no changes were noticed.
Acknowledgements
The author would like to thanks Allah, her family for always supporting her, The National Ribat University members.
Statement of Competing Interests
The author declares that she has no competing interests.
References
[1] | Coronavirus-Vaccine-a-6110.html, 2013. | ||
![]() | |||
[2] | https://en.wikipedia.org/wiki/Coronavirus, 2014. | ||
![]() | |||
[3] | Khan G (2013). A novel coronavirus capable of lethal human infections: an emerging picture. Virology Journal. 10 (66). https://virologyj.biomedcentral.com/articles/10.1186/1743-422X-10-66. | ||
![]() | View Article | ||
[4] | Modjarrad K (2016). MERS-CoV vaccine candidates in development: The current landscape. In Vaccine. WHO Product Development for Vaccines Advisory Committee (PDVAC) Pipeline Analyses for 25 Pathogens. Science Direct. Volume 34, Issue 26, 3 June 2016, Pages 2982-2987. | ||
![]() | View Article PubMed | ||
[5] | Ithete NL, Stoffberg S, Corman VM, Cottontail VM, Richards LR, Schoeman MC, Drosten C, Drexler JF, Preiser W. Close Relative of Human Middle East Respiratory Syndrome Coronavirus in Bat, South Africa. Publisher: CDC; Journal: Emerging Infectious Disease.s Article Type: Letter; Volume: 19; Issue: 10; Year: 2013; Article ID: 13-0946. | ||
![]() | |||
[6] | Wang L, Shi W, Joyce G. M., Modjarrad K, Zhang Y, Leung K, Lees R. C, Zhou T, Yassine M. H,…., Graham S. B (2015. Evaluation of candidate vaccine approaches for MERS-CoV. Nature Communications. Nature Communications 6, Article number: 7712. https://www.nature.com/articles/ncomms8712. | ||
![]() | |||
[7] | Kim Y, Ponomarenko J, Zhu Z, Tamang D, Wang P, Greenbaum J, Lundegaard C, Sette A, Lund O, Bourne PE, Nielsen M, Peters B (2012). Immune epitope database analysis resource. NAR. | ||
![]() | View Article | ||
[8] | Sidney J, Assarsson E, Moore C, Ngo S, Pinilla C, Sette A, Peters B (2008). Quantitative peptide binding motifs for 19 human and mouse MHC class I molecules derived using positional scanning combinatorial peptide libraries. Immunome Res 4:2. | ||
![]() | View Article PubMed PubMed | ||
[9] | Hoof I, Peters B, Sidney J, Pedersen LE, Sette A, Lund O, Buus S, Nielsen M (2009). NetMHCpan, a method for MHC class I binding prediction beyond humans. Immunogenetics 61: 1-13. | ||
![]() | View Article PubMed PubMed | ||
[10] | Nielsen M, Lundegaard C, Worning P, Lauemøller SL, Lamberth K, Buus S, Brunak S, Lund O (2003). Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci 12: 1007-1017. | ||
![]() | View Article PubMed PubMed | ||
[11] | Peters B, Sette A (2005). Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method. BMC Bioinformatics 6:132. | ||
![]() | View Article PubMed PubMed | ||
[12] | Karosiene E, Rasmussen M, Blicher T, Lund O, Buus S, Nielsen M (2013). NetMHCIIpan-3.0, a common pan-specific MHC class II prediction method including all three human MHC class II isotypes, HLA-DR, HLA-DP and HLA-DQ. Immunogenetics 65(10)711. | ||
![]() | View Article PubMed PubMed | ||
[13] | Nielsen M, Lundegaard C, Blicher T, Lamberth K, Harndahl M, Justesen S, Roder G, Peters B, Sette A, Lund O, Buus S (2007). NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence. PLoS ONE 2:e796. | ||
![]() | View Article PubMed PubMed | ||
[14] | Nielsen M, Lundegaard C, Blicher T, Peters B, Sette A, Justesen S, Buus S, and Lund O (2008). Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan. PLoS Comput Biol.4 (7)e1000107. | ||
![]() | View Article PubMed PubMed | ||
[15] | POORINMOHAMMAD N, MOHABATKAR H (2014). Identification of HLA-A*0201-restricted CTL epitopes from the receptor-binding domain of MERS-CoV spike protein using a combinatorial in silico approach. Turk J Biol, 38: 628-632 © TÜBİTAK. https://journals.tubitak.gov.tr/biology/issues/biy-14-38-5/biy-38-5-10-1401-21.pdf. | ||
![]() | |||
[16] | Badawi M.M, Salaheldin A.M, Suliman M.M, AbduRahim A.S, Mohammed AE.A, SidAhmed S A.A, Othman M.M, Salih A.M Salih (2016).In Silico Prediction of a Novel Universal Multi-epitope Peptide Vaccine in the Whole Spike Glycoprotein of MERS CoV. American Journal of Microbiological Research. Vol. 4, No. 4, 2016, pp 101-121. | ||
![]() | |||
[17] | Du L, Zhao G, Kou Z (2013). Identification of a receptor-binding domain in the S protein of the novel human coronavirus Middle East respiratory syndrome coronavirus as an essential target for vaccine development. J Virol. 87(17):9939-42. | ||
![]() | View Article PubMed PubMed | ||
[18] | 18- Mohamed H.A, Mohamed Y. O, AB. Salam S, Yousif A.H, Hassan M.M, Kaheel H.H and Hassan A.M (2014). In Silico analysis of Single Nucleotide Polymorphisms (SNPs) in human FANCA gene. International Journal of Computational Bioinformatics and In Silico Modeling. Vol. 3, No. 5 (2014): 502-513. | ||
![]() | |||
[19] | Shi J, Zhang J, Li S, Sun J, Teng Y, Wu M, Li J, Li Y, Hu N, Wang H, Hu Y (2015). Epitope-Based Vaccine Target Screening against Highly Pathogenic MERS-CoV: An In Silico Approach Applied to Emerging Infectious Diseases. PLoS ONE 10(12): e0144475. | ||
![]() | View Article PubMed PubMed | ||
[20] | Sharmin R, Abul Bashar Khademul Islam M M AB (2014). A highly conserved WDYPKCDRA epitope in the RNA directed RNA polymerase of human coronaviruses can be used as epitope-based universal vaccine design. BMC Bioinformatics 201415:161. | ||
![]() | View Article PubMed PubMed | ||
[21] | Saha, S. and Raghava, G.P.S. (2006). AlgPred: prediction of allergenic proteins and mapping of IgE epitopes. Nucleic Acids Research. Volume 34, W202-W209. | ||
![]() | View Article PubMed PubMed | ||
[22] | Doytchinova A.I and Flower R.D. VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC Bioinformatics. 2007 8:4. | ||
![]() | View Article PubMed PubMed | ||
[23] | Doytchinova A.I and Flower R.D. Identifying candidate subunit vaccines using an alignment-independent method based on principal amino acid properties. Vaccine. 2007 25:856-866. | ||
![]() | View Article PubMed | ||
[24] | Doytchinova A.I and Flower R.D. Bioinformatic Approach for Identifying Parasite and Fungal Candidate Subunit Vaccines. Open Vaccines Journal, 2008 1: 22-26. | ||
![]() | |||