Past two decades have seen a dramatic increase in the amount of information or data being stored in electronic format. Data storage has been easier as the accessibility of large amount of computing power is available at low cost. The research in bioinformatics has accumulated large amount of data. As the hardware technology is advancing, the cost of storage is decreasing. In the present work, data mining solution is provided for the problem of protein sequence alignment. Different formats of sequences are studied and plain ...
Read More
Past two decades have seen a dramatic increase in the amount of information or data being stored in electronic format. Data storage has been easier as the accessibility of large amount of computing power is available at low cost. The research in bioinformatics has accumulated large amount of data. As the hardware technology is advancing, the cost of storage is decreasing. In the present work, data mining solution is provided for the problem of protein sequence alignment. Different formats of sequences are studied and plain test format is chosen for the problem of consideration. Scoring matrix accesses the replacement of one amino acid by another, accepted by natural selection. The replacement can be due to the result of two distinct processes. i) Occurance of mutation in the portion of the gene template producing one amino acid of a protein. ii) Acceptance of the mutation by the species (similar function). PAM (Accepted Point Mutations) and BLOSUM (Blocks database) are the scoring matrices that are used for the different computations. BLOSUM-50 matrix is used for the problem under consideration. Global and local alignments are predicted alongwith the alignement score.
Read Less