Open Access Peer-reviewed

Mining Quantitative Association Rules in HIV Protein Sequences

Anubha Dubey1,, Usha Chouhan2

1Department of Bioinformatics, Manit, Bhopal (M.P), India

2Department of Mathematics, Manit, Bhopal (M.P), India

Journal of Biomedical Engineering and Technology. 2013, 1(2), 26-30. DOI: 10.12691/jbet-1-2-2
Published online: August 25, 2017


Lot of research has gone into understanding the composition and nature of proteins, still many things remain to be understood satisfactorily. It is now generally believed that amino acid sequences of proteins are not random, and thus the patterns of amino acids that we observe in the protein sequences are also non-random. In this study, we have attempted to decipher the nature of associations between different amino acids that are present in a HIV protein. This very basic analysis provides insights into the co-occurrence of certain amino acids in a HIV protein. Such association rules are desirable for enhancing our understanding of protein composition and hold the potential to give clues regarding the global interactions amongst some particular sets of amino acids occurring in proteins. The aim of association rules mining is to reveal underlying interactions in large sets of data items. Knowledge of these rules or constraints is highly desirable for the in-vitro synthesis of artificial proteins. This will also give new insights to understand protein-protein interactions in HIV.


data mining, quantitative association rule mining, protein composition.
[1]  Brenden,C. And Tooze, J. Introduction to protein structure (Garland Publishing, New York, 1991). PubMed
[2]  Yockey, H.P.(1977). On the information content of cytochrome. J.Theor. Biol.67,147-151.View Article
[3]  Strait, B.J. & Dewey, G.(1996). The Shannon information entropy of protein sequences. Biophy. J.71, 148-155.View Article
[4]  Pande, S.V.,Grosberg, A.Y. & Tanaka, T. (1994). Non randomness in protein sequences: evidence for a physically driven stage of evolution? Proc. Natl. Acad. Sci. U.S.A 91, 12972-12975.View Article
[5]  White, S.H. & Jacobs, R.E. (1993). The evolution of proteins from random amino acid sequences –I. Evidence of proteins from the lengthwise distribution of amino acids in modern proteins. J. Mol. Evol.36, 79-95.View Article  PubMed
[6]  Agrawal. R and Srikant, R. (1994). Fast algorithms for mining association rules. In Proc of the 20th Int’l Conference on very large databases, Santiago, Chile, September 94.
[7]  Fakuda, T, Morimoto, Y., Morishita, S. And Tokuyama, T. (1996). Data mining using two-dimensional optimized association rules: Scheme, algorithms, and visualization. In proc of the 20th Int’l Conference on Very Large Databases, Santiago, Chile, September’ 94.
[8]  Srikant, R. And agrawal, R. (1996). Mining quantitative association rules in large relational tables. Proc. ACM SIGMOID.
[9]  Brin, S., Motwani, R., and Silverstein, C. (1997). Beyond market basket : Generalizing association rules to correlations. In proc. 1197 ACM SIGMOID, pp 265-276. Tuscon, AZ.
[10]  Han, J. And Kamber. Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco, 2001.
[11]  Lent, B., Swami, A. And Widom, J. (1997). Clustering association rules. In Proc.Int’l Conf. Data Engineering (ICDE’97), PP220-231, England.
[12] scop/.
[13]  "The Structures of Life”. National Institute of General Medical Sciences. 2008-05-20.
[14]  Creighton, Thomas H. (1993). Proteins: structures and molecular properties. San Francisco: W. H. Freeman. Chapter 1. ISBN 0-7167-7030-X.
[16]  Lavanya Rishishwar, Neha Mishra, Bhasker Pant, Kumud Pant, K. R.Pardasani “ProCos: Protein Composition Server ”, Bioinformation, Volume 5 Issue 5 November 2010. PubMed
[17]  Nitin Gupta et al, Data Mining, LNAI 3755,PP.273-281, 2006.
[18]  R. Agrawal, T. Imielinski and A. Swami, Database Mining: A Performance Perspective, IEEE Transactions on Knowledge and Data Engineering, 5 (6), 914 (1993).View Article
[19]  Abdullah et al., “Detecting critical least association rules in medical databases, International Journal of Modern Physics: conference series, vol.1, no.1,1-5, 2010.
[20]  K. Rameshkumar, “Extracting Association Rules from Hiv Infected Patients Treatment Dataset”. Trends in Bioinformatics, 4: 35-46, 2011.