COPIA: A New Software for Finding Consensus Patterns in Unaligned Protein Sequences

dc.contributor.authorLiang, Chengzhien
dc.date.accessioned2006-08-22T14:21:14Z
dc.date.available2006-08-22T14:21:14Z
dc.date.issued2001en
dc.date.submitted2001en
dc.description.abstractConsensus pattern problem (CPP) aims at finding conserved regions, or motifs, in unaligned sequences. This problem is NP-hard under various scoring schemes. To solve this problem for protein sequences more efficiently,a new scoring scheme and a randomized algorithm based on substitution matrix are proposed here. Any practical solutions to a bioinformatics problem must observe twoprinciples: (1) the problem that it solves accurately describes the real problem; in CPP, this requires the scoring scheme be able to distinguisha real motif from background; (2) it provides an efficient algorithmto solve the mathematical problem. A key question in protein motif-finding is how to determine the motif length. One problem in EM algorithms to solve CPP is how to find good startingpoints to reach the global optimum. These two questions were both well addressed under this scoring scheme,which made the randomized algorithm both fast and accurate in practice. A software, COPIA (COnsensus Pattern Identification and Analysis),has been developed implementing this algorithm. Experiments using sequences from the von Willebrand factor (vWF)familyshowed that it worked well on finding multiple motifs and repeats. COPIA's ability to find repeats makes it also useful in illustrating the internal structures of multidomain proteins. Comparative studies using several groups of protein sequences demonstrated that COPIA performed better than the commonly used motif-finding programs.en
dc.formatapplication/pdfen
dc.format.extent439052 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10012/1050
dc.language.isoenen
dc.pendingfalseen
dc.publisherUniversity of Waterlooen
dc.rightsCopyright: 2001, Liang, Chengzhi. All rights reserved.en
dc.subjectComputer Scienceen
dc.subjectbioinformatics softwareen
dc.subjectmultiple alignmenten
dc.subjectmotif-findingen
dc.subjectconsensus pattern problemen
dc.titleCOPIA: A New Software for Finding Consensus Patterns in Unaligned Protein Sequencesen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentSchool of Computer Scienceen
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
cliang2001.pdf
Size:
428.76 KB
Format:
Adobe Portable Document Format