Encoding Information in DNA: From Basic Structure to Nanoelectronics

DNA (Deoxyribonucleic Acid) computing is a recent computing technique which is also referred as bio molecular computing or molecular computing. DNA computing is a new avenue for solving the computational problem manipulating the distinct nanoscopic molecule and nowadays the approaches of DNA computing are being employed to resolve combinatorial problems utilizing the advantages of parallelism and high-density storage characteristics of DNA. Besides DNA is considered as the most feasible substance to shape the most nanoscopic materials, manufacture distinct nanomechanical devices and formulating large-scale nanostructures due to its expedient structural features and molecular recognition properties. A concise discussion regarding the splendid advances in constructing nanoelectronics employing DNA computing paradigm and challenges of DNA computing is focused in this paper.


Introduction
The basic concept of DNA computing, more usually biomolecular computing, has been brought into the shape by applying the theory of using biomolecules as an elementary constituent of computing devices. One of the main goals of this relatively new field of study is to invent an alternative to the silicon based computer which is furnished by employing the biomolecular computing technique based on DNA molecules. Researchers from different disciplines like biological science, mathematics, information technology are being devoted to this sensational multidisciplinary area. Von Neumann explained the theory of self-reproducing automaton in early 1940 which has led the next computer scientist to breed the idea of molecular computing (Ezziane, 2005) [17]. Richard Feynman, the physicist, suggested that the living cells and molecular particles can be used in constructing the nano scale computing devices and also explained the handicaps of nanoscale operations and control (Amos, 2008) [4]. Richard P. Feynman focused the opportunities of biological systems to employ in the nano scale information processing as well as storing information, being inspired by the amazing activities that are performed by the cells in all living organisms within their nanoscopic sizes. (Feynman, 1961) [18]. Leonard Adleman is the torch bearer who became successful to provide the physical view of incomplete endeavor of Richard P. Feynman, using biomolecular tools for DNA computing. In 1994 Leonard Adleman from University of Southern California solved an illustration of the most difficult problem known as Directed Hamiltonian Path Problem, manipulating DNA strands which are considered as the first biological formula to solve any mathematical problem (Adleman, 1994) [2]. After the experiment of Leonard Adleman, DNA computing has become a more fertile platform for researchers as it is a multidisciplinary research area. The area of DNA computing is characterized by two different ways to perform the research like the theoretical way concerned with patterns, methods, and paradigms for DNA computing and the empirical way to test biochemical feasibility (Watada & Rohani Abu Bakar, 2008) [50]. Now a day a lot of researchers are being involved in DNA computing to develop the existing pattern of different nano structures, invent new devices and meet the challenges.

Basic Structure of DNA
Deoxyribonucleic Acid or DNA is a fundamental element of all alive organism which holds all the ancestral characteristics, information and instruction to live, develop and reproduce itself, make the flow of these characteristics from bearer to the next generation to preserve the purity of heredity. James Watson and Francis Crick figured out the double helical structure of DNA. Based on all data obtained from chemical and physical-chemical analysis it was observed that DNA is a long fibrous molecule. The crystallographic evidence from X-ray studies with the sodium salt of DNA suggested that two polynucleotide chains form the anatomical unit of DNA (Watson & Crick, 1953) [51]. The molecular element of this self-replicating material is called nucleotides which consist of nitrogen bases, sugar, and phosphate. Adenine (A), Thymine (T), Guanine (G) and Cytosine (C) are the four types of nitrogen bases that exist in DNA. A group of nucleotides, generally less than 30 nucleotides, forms a single strand polynucleotide chain known as an oligonucleotide. DNA consists of a double helical polynucleotide chain which layout is like a twisted staircase following the plectonemic system. The Figure 1 (a & b) below shows that the double helical structure and the molecular structure of DNA.
The nitrogen bases relate to each other following the complementary rule of Watson-Crick. According to this rule, Adenine complements Thymine and attach with each other making two hydrogen bond while Guanine complements Cytosine and attach with each other making three hydrogen bond. A double strand DNA is formed with the combination of two single strands DNA and often quoted in 5'-3' order (Nordiana Rajaee, Azham Zulkharnain & Awang Ahmad Sallehin Awang Hussaini, 2016) [32].

Computing with DNA
The method of computing with DNA is an affiliation of bio molecular computing in which DNA is used as the information bearer in order to perform arithmetic mathematical operations, the present researchers are dedicated to creating bio chips by embodying DNA molecules in a chip and the researchers have already able to use these particles. The storage capacity of this DNA material is greater than the existing silicon chip, it is seen that in 1cm3 of DNA material, data of tens of terabytes can be stored, (Georgalis, 2016) [19] because in the silicon based electronic computer there are two binary digits, 0 and 1 to hold information, whereas the four-letter alphabet A, T, G and C in a DNA strand can store much more information (Somnath Tagore, Saurav Bhattacharya, Md Ataul Islam & Md Lutful Islam, 2010) [45] and logic operation. The cherished goal of DNA computing is to use the biological molecule to carry through the computation instead of the customary uses of silicon chips. In order to perform complex.
Leonard Adleman initially developed this multidisciplinary field of computing, a new approach to massively parallel computation and proved the concept of using DNA to perform a computational operation by solving the seven point Hamiltonian Path Problem in 1994 (Kim, Jeng & Watada, 2006) [22]. Adleman used bio molecular tools such as ligation, amplification, and hybridization, to produce Hamiltonian Path from DNA encoding the vertices and edges of the graph in oligonucleotides of that DNA (Deaton, Murphy, Garzon, Franceschetti &Stevens, 1996) [12]. Though the reaction time of DNA is relatively low in comparison with the processing time of a single operation of existing electronic computer, the parallel processing characteristic of DNA plays a vital fore-word in DNA computing paradigm (Watada, 2008) [49] and due to perform computations simultaneously there could be billions upon trillions of DNA molecules participated in chemical reactions at a time and it consumes less energy (Kari & Landweber, 1999) [21].

Problems Solved by DNA Computing
DNA computing is a fertile platform to outgrow the performance of existing electronic based computer using the vast parallelism peculiarity of DNA. Utilizing the advantages of this specialty, scientists solved some complex computational problem such as Hamiltonian Path Problem, Maximal Clique Problem, Satisfiability Problem and Chess Problem (Ezziane, 2005) [17]. Leonard Adleman designed a seven node NP-Complete problem considering seven cities to form a Hamiltonian Path and considered a tour among that cities that starts from a given city as source node and ends at another given city as endpoint of the tour visiting all the seven cities exactly once at a time (Lipton, 1995b) [27]. Directed Hamiltonian Path Problem is recognized as one of the hardest NP-Complete and it is clear that Adleman's experiment not only gives a solution of the mathematical problem but also a hard computational problem (Kari, 1997) [20].
The outcome of Adleman experiment was inspiring but it is impossible to solve all instances of Nondeterministic Polynomial problem. Lipton (1995b) [27] claimed that Adleman used brute force way to solve the HPP which is not efficient enough to solve NP problem while this kind of problem is designed with 100 cities. The operation time of the experimental algorithm grows exponentially when the number of cities increases. It appears as an expensive computing method for the HPPs that are consist of a large number of cities. Adleman's demonstration seems as a trivial problem, as it involves only seven cities. The approaches of DNA computing would be very difficult in concern to the required amount of molecule if the HPP involves a large number of cities (Dimitrova, 2006; & Tagore et al., 2010) [14,45]. In order to solve any largescale problem, a huge amount of DNA molecules is required which lead to assume that the theory of DNA computing does not facilitate any new abilities of computing. The error rate is another limitation of Adleman's method. In order to avoid the maximum error rate, iteration of this method should be limited (Tagore et al.,2010) [45]. Though biological computation is more potential in term of parallelism than conventional one, the performance of biological experiment per second is found to be confined to a shorter portion while the conventional electronic computer can easily operate millions of instructions per second (Lipton,1995b) [27].
Richard J. Lipton examined another experiment to solve NP-Complete problem which is known as satisfiability problem. He stated that biological machines will be limited in the amount of parallelism that they can perform a solution of SAT problem on 70 variables directly is better than using the reduction from SAT to HPP and the biological machine will also have some other technical advantages over the original method (Lipton, 1995a) [26].

DNA Computer
A nanoscale computer is said to be DNA computer that performs complex computation and store information using DNA molecules. Beneson, Adar, Paz-Elizur, Livneh & Shapiro (2003) [7] introduced a self-explanatory molecular device in which DNA molecules act as input, output, and software; DNA restriction and ligation enzymes act as hardware that uses ATP as fuel. The remarkable footstep towards building DNA computer had been taken by the researchers of the University of Rochester and that was the development of DNA logic gates. These logic gates rely on DNA code as the complement of the electrical signal in term of performing the logical operations. The genetic material fragments are fed to these logic gates as input and interlace these input slices to form a single output (Deepshikha Bhargava & Divya Arora, 2008) [8].
Though the bio components for DNA computer such as logic gates and biochips will take years to achieve the manifest outcome as building operational DNA computer the scientists believe that it will be more compact, accurate and efficient than conventional computer that ever built (Deepshikha Bhargava & Divya Arora, 2008) [8] and it is vastly potential to result in non-toxic equipment's and also self-powered -in terms of energy (Georgalis, 2016) [19].

DNA Self-Assembly for Nanostructure
DNA self-assembly is an epoch-making technique and utilizing the advantages of this versatile feature occupied by DNA, the supreme pledging templates can be formed of the most firmly fixed nano material. The process of involving the spontaneous self-ordering of substructures into super-structures is referred as Self-assembly. Single stranded DNA molecules are synthesized artificially to use in self -assembly method that is self-assembled into DNA crossover molecules (tiles) which sticky ends match each other to form tiling lattice. Borromean rings, a cube and a truncated octahedron and DNA knots are the first DNA nano structure created by Seeman using DNA branched junction (Reif, LaBean & Seema, 2001) [38]. Seeman also generated more rigid junctions which have crossover and addressable sticky ends at the edges and further they used these tiles to construct well-defined 2D lattices. These structures can be used as templates for constructing nanowires and offer numerous opportunities for creating genomics applications (Faisal A. Aldaye, Palmer & Sleiman, 2008) [3].
Yan, LaBean, Feng, and Reif (2003) [54] explained two self-assembly processes that construct an aperiodic patterned larger DNA barcode lattice and a periodic ribbon lattice. The outcomes of their experiments are the steps of implementing visual readout devices that are able to transfer information encoded on one dimensional DNA strands into two-dimensional DNA strands. For building complex, nano patterned molecular components for different application in medicine, sensors, electronics and many other fields DNA self-assembly is desired as the supreme potential technique. Researchers are trying to create the layout for a de-multiplexed random access memory circuit using the DNA self-assembly technique which could serve as a template for molecular electronics, can open a new era in computing paradigm (Yan et al., 2003) [54].
The noticeable obstacles can appear as acute on the successive way of DNA computing as well as DNA selfassembly technique and that is the high cost of synthetic DNA molecule and high error rate of self-assembly. These limitations are recognized as the major challenges to nano structure self-assembly (Pinheiro1, Han, Shih & Yan1, 2011) [35].

DNA Nano Mechanical Devices
DNA based nano machines are known as DNA supramolecular structures. Formerly some DNA nano mechanical devices have been built that exhibit motions and arbitrated by external atmospheric changes like the addition and removal of DNA fuel strands or the change in ionic solution (Tagore et al., 2010) [45]. Xia et al. (2008) [52] described a synthetic nanopore-DNA system where single solid-state conical nanopores can be reversibly gated by switching DNA motors that are immobilized inside the nanopores. This DNA motor-driven nanopore switch can be used to build nanopore machines with more precisely controlled functions in the nearest future (Xia et al., 2008) [52]. Dittmer and Simmel (2004) [15] combined DNA tweezers with the transcription machinery of prokaryotic organisms and this transcription process was verified by the Gel electrophoresis and FRET experiments which demonstrate that this process successfully reads out the gene and automatically brings the machine to the desired state. Shin and Pierce (2004) [42]; Sherman and Seeman (2004) [40] demonstrated a synthetic molecular walker that mimics the bipedal gait of kinesin. With the help of multiplexed fluorescence quenching real-time monitoring of Walker movement is achieved. Tian and Mao (2004) [46] proposed a simple molecular gear system in which there were two gears consists of DNA and fueled by DNA itself. DNA hybridization provides the motion energy and arbitrary instruction could control the rolling direction. It is observed that there's a continuous reciprocal rolling phenomenon in these gears (Tian & Mao 2004) [46]. This effort and the newly demonstrated walkers, DNA motors visualize a footstep to formulate DNA nano machines which could be able to deal with complicated motion.
Multiple simple DNA motors can build up complicated nano machines in which these simple motors can act as a unit cell to communicate with each other efficiently and work together cooperatively. Working more with DNA nanostructures many unique DNA motors can be developed which can be used to build complex nano mechanical devices.

DNA for Conducting Nanowire and Nanotubes as Circuit Elements
The recent advances in the field of molecular technology indicate clearly that the conventional microelectronics can be replaced by the molecular scale circuits having huge circuit densities. It is observed in many cases that electron can flow through DNA and even the DNA shows semiconducting, superconducting and insulating characteristics. Though the conducting behavior of bare DNA is insufficient for nanoelectronics engineering purpose, it has been used as a template to arrange more highly conductive materials for electronic application. M-DNA is such a new form imino proton of the DNA base pairs is replaced by a  [30] showed that nanoscale gold wires can be built on DNA lattice. Monson and Woolley (2003) [31] stated an experiment to construct copper nanowire using DNA template. Park, Yan, Reif, LaBean, and Finkelstein (2004) also described an ideal electro-less deposition method to metallize double strand DNA in Silver(Ag) solution to form silver nanowire using the self-assembled DNA scaffold template. Further Liu, Park, Reif and LaBean (2004) [33] constructed and characterized a self-assembling super structured DNA nanotube composing three dimensional DNA crossovers that are modified by double stranded thiol-containing DNA molecules. Pre stretched and immobilized DNA strands had been used as templates to fabricate conducting polymer nanowires on thermally oxidized Si surfaces in Ma, JZhang, GZhang, and He (2004)'s [29] strategy. Silicon nanowires with ssDNA probes that are covalently immobilized on their surfaces were used to build high sensitive and sequence specific DNA sensors (Li et al.,2004) [25].

DNA as Data Storage
DNA is the information carrier of all living organism which can store huge amount of genetic information in a small size. In the nucleotides sequence of DNA a substantial amount of information can be stored and due to these overwhelming properties of DNA, it is considered to use as reproducible, heritable media of storage material (Yachie, Sekiyama, Sugahara, Ohashi and Tomita, 2007) [53]. It is seen that a Bacillus subtilis bacterium spore with a genome size of 4.2 Mega Base pairs and 1 µm diameter, can contain characters (char/m 2 ) twenty million times more than a 200 Megabyte Zip disk of a diameter of 1 cm. As DNA has a long shelf life and the entire sequences never get damaged during denaturation, data can be stored at high density in it without any attenuation and the desired number of copies of this storage can be obtained by using the Polymerase Chain Reaction technique. (Siddhant Shrivastava & Rohan Badlani,2014) [43]. Yachie et al. (2007) [53] presented a data storing and retrieving system on the basis of the sequential arrangement of DNA molecules in living cells and have become successful to store the message E=MC^2 1950! into the B. subtilis strain BEST2136. Data inheritance is realized in a living organism. If the data are inserted into these and this easy alignment based obtainment promises to provide the supreme fixity of data inheritance. Bacillus subtilis is suggested as a high-density data storage as there's a limitation of inserting nucleotide sequences artificially into chromosomal DNA. In order to store and preserve data with huge density in heritable media, the codes and experimental methods need to be developed or multi fragments need to be inserted into multiple-species meta genomes. It is expected that the alignment-aided data storage and retrieval technique can be an efficient genome sequencing technique in practical (Yachie et al., 2007) [53]. A rewritable recombinase addressable data (RAD) module to store digital information within chromosome in a secure pattern was shown by Bonnet, Subsoontorn, and Endy (2012) [9]. This RAD memory element allows combinatorial data storage and can be switched its operation continually keeping up a uniform performance and this memory is also capable of acting as passive information storage in the absence of heterologous gene expression for over 100 cell divisions. It is desired that the DNA inversion RAD module can be translated into long term data storage application. (Bonnet et al., 2012) [9]. Church, Gao and Kosuri (2012) [10] developed an encoding scheme in which they converted an HTML-coded draft and then encoded this into oligonucleotides. The error rate is remarkably reduced in this encoding scheme. Staden (1980) [44] described a new way of storing DNA gel reading data. Shin and Pierce (2004) [41] described a DNA scaffold that supports a one-dimensional array of independently and reversibly addressable sites at 7 nm spacing. This controllable nanopatterning capability can be applied to molecular transport, propagating cargo through an array of address branches. Zhang and Kim (2006) [55] described a computational model of contentaddressable information storage and retrieval based on the hyper-network architecture which simulation result showed that a large number of short DNA strands can be effective in information processing.
DNA would become a universal storage medium no doubt in near future but it is facing some challenges in present time, some of which are due to its structural composition and some of are due to technological inconsistency. The total process of encoding, amplifying, sequencing, reconstructing decoding consumes more times than the existing electronic devices which lead to assume DNA is unlikely to compete with the conventional formats. There also experienced many types of errors dealing with DNA like Homo polymers, sequencing errors, error due to lower access rate etc. DNA has auto correcting enzyme when it exists in living cells but there are no artificial enzymes are available for artificial DNA. There are possibilities of losing information if DNA strings need to be discarded due to inefficient decoding technique and wasting DNA to redo that. According to Huffman the best storage and lossless compression occur for base three and this is why DNA seems to be inefficient for data storage as it is consisting of 4 bases. The difficulty of synthesizing long DNA sequence while simulating on the computer for a specific design and the high cost of computing can appear as major challenges for practical DNA storage (Siddhant Shrivastava & Rohan Badlani, 2014) [43]. The increasing quantity of redundant data produced by the new sequential technology appears as a great challenge for sequence storage and management. (Batley & Edwards, 2009) [6].

Computer Architecture
DNA self-assembling properties for circuit designing is a potential research area which leads the scientists to expect large scale manufacturing to formulate varieties of computer architecture that is able to solve complex computational problems. There are numerous layouts of upcoming appliances that can be used in designing the cherished DNA computer. Decoupled Array Multi-Processor (DAMP) and Single Instruction Multi-Data (SIMD) are the two self-assembled parallel computer architectures. This two architecture is similar in design and confides greatly on computing the operation time. These architectures are also capable of solving a more complex computational problem like 15 node Hamiltonian Path Problem. SIMD is quite same as a content-addressable memory but the self-assembly property of SIMD make a noticeable difference between these two. A directed selfassembly technique that utilizes the feature of DNA hybridization to construct precise nanoscopic circuitry, enables these architectures and this process is in the development stage which is recognized as the replacement of photolithography, used in conventional silicon technologies. The challenges for the self-assembly is to attain the alternative approach to large-scale computing. In order to utilize the advantages of DNA self-assembly, the miraculous change in fabrication scale and the limitation of circuit size need to be balanced. (Dwyer, Poulton, Taylor & Vicci, 2004) [16].

Synthetic Protocols for Communication Network
Biological cells are capable of transmitting, receiving and processing information using signal transduction mechanisms and signaling networks keeping up the interaction in a perplexing biochemical system. This ability has a great potentiality to build up a new platform to invent new application in the field of information processing. A good combination of Molecular Computing and Molecular Communication can form an efficient computing mechanism that can be termed as bio-nano communication, can develop protocols for communicating among the bio-nano devices. The development of communication protocols for nano network can set up the classic basement for the next bio-nano devices. (Walsh et al., 2009) [47]. Walsh et al. (2009) [47] proposed a bio cell-based communication protocol to enable communication between biological nano devices and specifies the biomolecule address encoding, decoding, error correction and link switching mechanisms for molecular communication networks. Walsh, Balasubramaniam, Botvitch and Donnelly (2010) [48] presented a molecular communication and data transmitting method using synthetic molecular computing techniques that demonstrate the performance of molecular computing mechanism in developing the elements from various layers in the communication stack. The main applications of this DNA based protocol are the devices that require rich sensing capabilities as it is the most ideal for periodic monitoring and sensing. The high coordination required for each step of this protocol and the ability of the virus to pick up the message molecule before leaving the cell creates complexity for this protocol which seems to be the major challenge for this approach. The number of destination nodes that can be transmitted by each node is also considered as an observable challenge for DNA based protocols (Walsh et al., 2010) [48].

Security Management in Cloud
Computing Based on DNA Different types of data protection schemes are very important to protect data in unsecured networks like the internet because the confidentiality of data has become a great challenge in cloud computing environment. Data hiding are well-known techniques to protect data through the internet which aims at eliminating the infiltrators' role and authorizing the clients. In this circumstances, DNA computing is recognized as a potential way to secure data in cloud storage. It is difficult to find out the hidden information in DNA sequence by the attackers due to the crucial visibility of DNA sequence (Mohammad Reza Abbasy and Bharanidharan Shanmugam, 2011) [1]. In order to increase complexity in data retrieving method and increase the confidentiality of encoded information in DNA sequences, Mohammad Reza Abbasy and Bharanidharan Shanmugam (2011) [1] suggested an algorithm for data hiding in DNA sequences on the basis of binary coding and complementary pair rules. Ranalkar and Phulpagar (2014) [37] demonstrated another security strategy based on DNA cryptography that can provide a secured data storage on multi clouds, cover up the customer's satisfaction and attract more investors for industrial as well as future research farms. Cui, Qin, Wang, and Zhang (2008) [11] also designed an encryption scheme which has a high confidential strength. Leier, Richter, Banzhaf, and Rauhe (200) [24] showed two different cryptographic approaches one of which used for steganography to provide rapid encryption and decryption and the other method can be used as a kind of molecular checksum and help to strengthen security.

Solving Clustering Problem
Clustering is a method to construct a structure in high dimensional data and form versatile relationships in data and information granules. Rohani Abu Bakar, Watada, Pedrycz (2008) [5] proposed an algorithm and showed the corresponding mechanism to develop a clustering technique based on DNA computing that can deal with an unknown number of clusters, huge data sets, and encountering a heterogeneous character of available data. There's a case sensitive issue in this algorithm that if a small number of patterns of DNA sequences is considered in the final stage, any inappropriate sequences can interfere and can affect the final result. An effective technique is also suggested to avoid this and that is to involve more patterns so that the less number of unworthy sequences such as outliers cannot influence on final results of clustering. ( Rohani Abu Bakar et al., 2008) [5].

Conclusion
Computing using DNA is an ever-dazzling computing platform where new methods and materials can be developed to build microprocessor for next generation. The numerous unique properties of DNA like extreme storage density, enormous parallelism, wondrous energy efficiency, miniature scale, molecular recognition capacity makes DNA as a promising element to form nano materials. In recent years DNA Manipulation technology has improved rapidly which offers a clear indication of building more efficient DNA computer that can outperform the performance of existing silicon based computer. Here we discuss the recent advances, methods and the challenges of DNA computing. Though the field of DNA computing is still in its infancy level, Researchers are being involved in the field of DNA computing and trying heart and soul to construct the super-attractive computational way to harness the noble nano structures implementing the theoretical as well as the practical advancements of DNA computing technology which can open a new era in the computational sector.
The massive parallelism property that inherent in DNA can lead the researchers to expect to have an effective solution for the complex computational problem like NPcomplete hard problem while the performance of traditional silicon based computers is bounded in a limitation in terms of parallel processing. This noble property of DNA also offers the possibility to build up the future DNA computer having a tremendous processing speed. Utilizing the advantages of DNA self-assembly DNA could be used as the template for building nano materials. The huge storage density of DNA inspires to build DNA storage medium that can provide a continuous data flow without attenuation.
Though some obstacles and challenges appear as the barrier on the way of development of DNA computing technology, there are a lot of opportunities to design a DNA computer and create real life application using the appealing features of DNA in the field of information processing and computing. DNA computing is not only capable of solving the complex computational problem it is also a potential computing paradigm to encrypt data maintaining the supreme security to facilitate the supreme confidentiality of the information in communication sector and it is promising that the security level is not reachable for any highly expert hackers. There's a great expectation that all of the efforts of researchers will be embodied in a complete visual, practical configuration of workable DNA computer in near future.