3. References by subjects


3.1 Algorithms, structure



Books
Adams, M. D., Fields, C. and Venter, J. C. (1994). Automated DNA Sequencing and Analysis. New York: Academic Press, 368 pages.
Bishop, M. J. (1994). Guide to Human Genome Computing. London: Academic Press, 350 pages.
Brutlag, D. L. and Sternberg, M. J. E. (1996). Sequences and Topology. London: Current Biology Ltd., 427 pages.
Creighton, T. E. (1993). Proteins: Structures and Molecular Properties (Second Edition ed.). New York: Freeman.
Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory (1st ed.). New York NY: John Wiley and Sons Inc.
Doolittle, R. F. (1986). Of Urfs and Orfs: A Primer on How to Analyze Derived Amino Acid Sequences. University Science Books, Mill Valley, California.
Doolittle, R. F. (1990). Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences (1 ed.). Methods in Enzymology Volume 183, New York: Academic Press.
Doolittle, R. F. (1996). Computer Methods for Macromolecular Sequence Analysis. (Vol. 266). New York: Academic Press. 711 Pages.
Fasman, G. D. (1989). Prediction of Protein Structure and the Principles of Protein Conformation. New York NY: Plenum Press,
Feller, W. (1968). An introduction to probability theory and its application. 3rd Edition . New York: John Wiley and Sons.
James, M. (1985). Classification Algorithms (1st ed.). New York, NY: John Wiley and Sons.
Gribskov, M. and Devereux, J. (1991). Sequence Analysis Primer. New York: Stockton Press, 279 pages.
Gusfield, D. (1997). Algorithms on Strings, Trees and Sequences. (1st. ed.). Cambridge, UK: Cambridge University Press, 534 pages.
Hunter, L. (1993). Artificial Intelligence and Molecular Biology. Menlo Park, CA: AAAI Press, 470 pages.
Hunter, L., Searls, D. and Shavlik, J. (1993). First International Conference on Intelligent Systems for Molecular Biology. Menlo Park, CA.: AAAI Press.
Knuth, D. E. (1973). Sorting and Searching . Reading Mass: Addison-Wesley.
Lander, E. S. and Waterman, M. S. (1995). Calculating the Secrets of Life: Applications of the Mathematical Sciences in Molecular Biology. Washington D. C.: National Academy Press, 285 pages.
Lesk, A. (1991). Protein Architecture: A Practical Approach . Oxford: IRL Press at Oxford University Press. 287 pages
Neapolitan, R. E. (1990). Probabilistic Reasoning in Expert Systems: Theory and Algorithms . New York, New York: John Wiley and Sons.
Sankoff, D. and Kruskal, J. B. (1983). Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison . Reading, Massachusetts: Addison-Wesley. 382 pages
Schultze-Kremer, S. (1994). Advances in Molecular Bioinformatics. Washington D.D.: IOS PRess, 259 pages.
Smith, D. W. (1994). Biocomputing: Informatics and Genome Projects. New York: Academic Press Inc., 336 pages.
Trifonov, E. N. and Brendel, V. (1986). Gnomic: A Dictionary of Genetic Codes. Balaban Publishers, Philadelphia, Pennsylvania.272 Pages.
von Heijne, Gunnar (1987). Sequence Analysis in Molecular Biology: Treasure Trove or Trivial Pursuit, Academic Press, New York. 188 Pages
Waterman, M. (1988). Mathematical Methods for DNA Sequences, CRC Press, Cleveland Ohio. 283 Pages.
Waterman, M. S. (1995). Introduction to Computational Biology. Chapman & Hall Press, London 430 pages.
Reviews
Altschul, S. F., Boguski, M. S., Gish, W. and Wootton, J. C. (1994). Issues in searching molecular sequence databases. Nat Genet 6 (2), 119-29.
Boguski, M. S. (1992). Computational sequence analysis revisited: new databases, software tools, and the research opportunities they engender. J Lipid Res, 33(7), 957-74.
Chao, K.-M., Hardison, R. C. and Miller, W. (1994). Recent developments in linear-space alignment methods: A survey. J. Computational Biology 1 (4), 271-291.
Doolittle, R. F. (1994). Protein sequence comparisons: searching databases and aligning sequences. Curr Opin Biotechnol 5 (1), 24-8.
Felsenstein, J. (1988). Phylogenies from molecular sequences: inference and reliability. Annual Review of Genetics 22 , 521-565.
Fischer, D., Rice, D., Bowie, J. U. and Eisenberg, D. (1996). Assigning amino acid sequences to 3-dimensional protein folds. Faseb J 10 (1), 126-36.
Fischer, C., Schweigert, S., Spreckelsen, C., & Vogel, F. (1996). Programs, databases, and expert systems for human geneticists--a survey. Hum Genet, 97(2), 129-37.
Garnier, J. and Levin, J. M. (1991). The protein structure code: what is its present status? Comput Appl Biosci 7 (2), 133-42.
Gelfand, M. S. (1995). Prediction of function in DNA sequence analysis. J Comput Biol 2 (1), 87-115.
Gschwend, D. A., Good, A. C., & Kuntz, I. D. (1996). Molecular docking towards drug discovery. J Mol Recognit, 9(2), 175-86.
Hogue, C. W. (1997). Cn3D: a new generation of three-dimensional molecular structure viewer. Trends Biochem Sci, 22(8), 314-6.
Holm, L. and Sander, C. (1994). Searching protein structure databases has come of age. Proteins 19 (3), 165-73.
Holm, L., & Sander, C. (1996). Mapping the protein universe. Science, 273(5275), 595-603.
Mural, R. J., Einstein, J. R., Guan, X., Mann, R. C. and Uberbacher, E. C. (1992). An artificial intelligence approach to DNA sequence feature recognition. Trends Biotechnol 10 (1-2), 66-9.
Rost, B. and Sander, C. (1994). Structure prediction of proteins--where are we now? Curr Opin Biotechnol 5 (4), 372-80.
Russell, R. B., & Sternberg, M. J. (1995). Structure prediction. How good are we? Curr Biol, 5(5), 488-90.
Stormo, G. D. (1988). Computer methods for analyzing sequence recognition of nucleic acids. Annu. Rev. Biophys. Biophys. Chem. 17, 241-263.
Tyler, E. C., Horton, M. R. and Krause, P. R. (1991). A review of algorithms for molecular sequence comparison. Comput Biomed Res, 24(1), 72-96.
Vingron, M. and Waterman, M. S. (1994). Sequence alignment and penalty choice. Review of concepts, case studies and implications. J Mol Biol 235 (1), 1-12.
Waterman, M. S. (1994). Parametric and ensemble sequence alignment algorithms. Bull Math Biol, 56(4), 743-67.
White, S. H. (1994). Global statistics of protein sequences: implications for the origin, evolution, and prediction of structure. Annu Rev Biophys Biomol Struct 23 , 407-39.
3.2 Databases: what is on the internet
02references.html
General References on the Internet
Cerf, V. (1991). Networks. Scientific American, 265(3), 72-84.
Dertouzos, M. L. (1991). Communications, Computers and Networks. Scientific American, 265(September), 62.
Engst, A. C. (1996). Internet Starter Kit (4th ed.). Indianapolis, IN: Hayden Books 858 pages.
Estrada, S. (1993). Connecting to the Internet. O'Reilly & Associates Inc., Sebastopol, CA. 170 pages.
Jennings, D. M., Landweber, L. H., Fuchs, I. H., Farber, D. J., and Adrion, W. R. (1986). Computer Networking for Scientists. Science 231, 943-950.
Kehoe, B. P. (1993). Zen and the Art of Internet (Second Edition ed.). Engelwood Cliffs, NH 07632: P.T.R. Prentice Hall.
Krol, E. (1992). The Whole Internet User's Guide and Catalog (2nd ed.). Sebastopol, California: O'Reilly and Associates, Inc., 376 pages.
Quarterman, J. S. and Carl-Mitchell, S., (1994). The Internet Connection: System Connectivity and Configuration, Addison Wesley Publishing Company, Menlo Park, CA. pages 270.
Tesler, L. G. (1991). Networked Computing in the 1990's. Scientific American, 265(Sept.), 86.
Shimomura, T. (1996). Takedown. Hyperion Press, New York., pages 324.
Walsh, J. (1988). Designs on a National Research Network. Science, 239, 861.
Appel, R. D., Sanchez, J.-C., Bairoch, A., Golaz, O., Ravier, F., Pasquali, C., Hughes, G. J. and Hochstrasse, D. F. (1996). The Swiss-2DPAGE database of two-dimensional polyacrylamide gel electrophoresis, its status in 1995. Nucleic Acids Res., 24(1), 180-181.
Bairoch, A., Bucher, P. and Hofman, K. (1996). The Prosite Database, its status in 1995. Nucleic Acids Res., 24(1), 189-196.
Bairoch, A. and Apweiler, R. (1996). The Swiss-Prot Protein sequence data bank and its new supplement Trembl. Nucleic Acids Res., 24(1), 21-25.
Bairoch, A. (1996). The ENZYME Data Bank in 1995. Nucleic Acids Res., 24(1), 221-222.
Bairoch, A. (1991). SEQANALREF: a sequence analysis bibliographic reference databank. Comput Appl Biosci, 7(2), 268.
Barker, W. C., George, D. G. and Hunt, L. T. (1990). Protein sequence database. Methods Enzymol, 183, 31-49.
Benson, D. A., Boguski, M., Lipman, D. J. and Ostell, J. (1996). GenBank. Nucleic Acids Res., 24(1), 1-5.
Bleasby, A., Griffiths, P., Harper, R., Hines, D., Hoover, K., Kristofferson, D., Marshall, S., O'Reilly, N. and Sundvall, M. (1992). Electronic communications and the new biology. Nucleic Acids Res, 20 (16), 4127-4128.
Brandt, K. A. (1993). The GDB Human Genome Data Base: a source of integrated genetic mapping and disease data. Bull Med Libr Assoc, 81(3), 285-92.
Chiang, D. (1994). Reaching NLM through the Internet. Med Ref Serv Q, 13(1), 83-92.
Cinkosky, M., Fickett, J. W., Gilna, P. and Burks, C. (1991). Electronic Data Publishing and GenBank. Science, 252 (31 May), 1273-1277.
Cuticchia, A. J., Fasman, K. H., Kingsbury, D. T., Robbins, R. J. and Pearson, P. L. (1993). The GDB human genome data base anno 1993. Nucleic Acids Res, 21(13), 3003-6.
Engels, W. R. (1993). Contributing software to the internet: the Amplify program. Trends Biochem Sci, 18(11), 448-50.
Fasman, K. H., Letovsky, S. I., Cottingham, R. W. and Kingsbury, D. T. (1996). Improvements to the GDB™ Human Genome Data Base. Nucleic Acids Res., 24(1), 57-63.
Frey, A. H. (1994). The internet biologist [news]. Faseb J, 8(14), 1110.
Frisse, M. E., Kelly, E. A. and Metcalfe, E. S. (1994). An Internet primer: resources and responsibilities. Acad Med, 69(1), 20-4.
Fuchs, R. (1994). Sequence analysis by electronic mail: a tool for accessing Internet e-mail servers. Comput Appl Biosci, 10(4), 413-7.
George, D. G., Barker, W. C., Mewes, H. W., Pfeiffer and Tsugita, A. (1996). The PIR-International Protein Sequence Database. Nucleic Acids Res., 24(1), 17-20.
Heumann, K., George, D. and Mewes, H. W. (1994). A new concept of sequence data distribution on wide area networks. Comput Appl Biosci, 10(5), 519-26.
Holm, L. and Sander, C. (1996). The FSSP database: fold Classification based on structure-structure alignment of proteins. Nucleic Acids Res., 24(1), 206-209.
Hutchinson, F. and Donnellan, J. E., Jr. (1994). Yale database for DNA sequence changes in mutagenesis. Nucleic Acids Res, 22(17), 3566-8.
Huysmans, M., Richelle, J. and Wodak, S. J. (1991). SESAM: a relational database for structure and sequence of macromolecules. Proteins, 11(1), 59-76.
Jacobson, D. (1994). The World Wide Web for biologists. Protein Sci, 3(11), 2159-61.
Jones, R. (1992). Alerting users to relevant new entries in the GenBank DNA sequence database. Comput Appl Biosci, 8(2), 199.
Keen, G.et al. (1996). The Genome Sequence Database (GSDB): Metting the challenge of genomic sequencing. Nucleic Acids Res., 24(1), 13-16.
Krawetz, S. A. (1989). Sequence errors described in GenBank: a means to determine the accuracy of DNA sequence interpretation. Nucleic Acids Res, 17 (10), 3951-7.
O'Donnell, C. (1994). Obtaining software via INTERNET. Methods Mol Biol, 24, 345-54.
Peitsch, M. C., Wells, T. N., Stampf, D. R. and Sussman, J. L. (1995). The Swiss-3DImage collection and PDB-Browser on the World-Wide Web. Trends Biochem Sci, 20(2), 82-4.
Pietrokovski, S., Henikoff, J. G. and Henikoff, S. (1996). The Blocks Database A system for Protein Classification. Nucleic Acids Res., 24(1), 197-200.
Roberts, R. J. and Macelis, D. (1996). REBASE - restriction enzymes and methylases. Nucleic Acids Res., 24(1), 223-235.
Rodriguez-Tomé, P., Stoehr, P., Cameron, G. N. and Flores, T. P. (1996). The European Bioinformatics Institute (EBI). Nucleic Acids Res., 24(1), 6-12.
Schneider, R. and Sander, C. (1996). The HSSP database of protein structure-sequence alignments. Nucleic Acids Res., 24(1), 201-205.
Smith, R. H., Gottesman, S., Hobbs, B., Lear, E., Kristofferson, D., Benton, D. and Smith, P. R. (1991). A mechanism for maintaining an up-to-date GenBank database via Usenet. Comput Appl Biosci, 7 (1), 111-2.
Smith, T. F. (1990). The history of the genetic sequence databases. Genomics, 6 (4), 701-7.
Stoehr, P. J. and Omond, R. A. (1989). The EMBL Network File Server. Nucleic Acids Res, 17 (16), 6763.
Williams, G. W. and Gibbs, G. P. (1990). Automatic updating of the EMBL database via EMBNet. Comput Appl Biosci, 6 (2), 122-3.
Williams, R. W. (1994). The Portable Dictionary of the Mouse Genome: a personal database for gene mapping and molecular biology. Mamm Genome, 5(6), 372-5.
Woodsmall, R. M. and Benson, D. A. (1993). Information resources at the National Center for Biotechnology Information. Bull Med Libr Assoc, 81(3), 282-4.
Zehetner, G. and Lehrach, H. (1994). The Reference Library System--sharing biological material and experimental data. Nature, 367(6462), 489-91.

3.3 Patterns, pattern matching

Abarbanel, R. M., Wieneke, P. R., Mansfield, E., Jaffe, D. A. and Brutlag, D. L. (1984). Rapid searches for complex patterns in biological molecules. Nucleic Acids Res. 12, 263-280.
Aho, A. V. and Corasick, M. J. (1975). Fast pattern matching: An aid to bibliographic search. Commun. ACM 18, 333-340.
Bairoch, A., Bucher, P., & Hofmann, K. (1997). The PROSITE database, its status in 1997. Nucleic Acids Res, 25(1), 217-21.
Bork, P. (1989). Recognition of functional regions in primary structures using a set of property patterns. FEBS Lett 257 (1), 191-5.
Bork, P. and Koonin, E. V. (1996). Protein Sequence Motifs. Current Opinion in Structural Biology 6 (3), 366-376.
Gusfield, D. (1997). Algorithms on Strings, Trees and Sequences. (1st. ed.). Cambridge, UK: Cambridge University Press. Chapters 1-5.
Henikoff, S. (1996). Scores for Sequence Searches. Current Opinion in Structural Biology 6 (3), 353-360.
Knuth, D. E. (1973). The Art of Computer Programming, Volume 3. Sorting and Searching. Reading Mass: Addison-Wesley.
Knuth, D. E., Morris, J. H. and Pratt, V. R. (1977). Fast pattern matching in strings. SIAM J. Comput. 6, 323-350.
Landau, G.M., Vishkin, U. and Nussinov, R. (1986). An efficient string matching algorithm with k differences for nucleotide and amino acid sequences. Nucleic Acids Res. 14, 31-46.
Mehldau, G. and Myers, G. (1993). A system for pattern matching applications on biosequences. Comput Appl Biosci 9 (3), 299-314.
Nevill-Manning, C., Sethi, K., Wu, T. D., & Brutlag, D. L. (1997). Enumerating and Ranking Discete Motifs. ISMB-97, 4, 202-209.
Nussinov, R. (1983). An efficient code searching for sequence homology and DNA duplication. J. Theor. Biol. 100, 319-28.
Nussinov, R. (1983). Efficient algorithms for searching for exact repitition of nucleotide sequences. J. Mol. Evol. 19, 283-5.
Rohde, K. and Bork, P. (1993). A fast, sensitive pattern-matching approach for protein sequences. Comput Appl Biosci 9 (2), 183-9.
Saqi, M. A. and Sternberg, M. J. (1994). Identification of sequence motifs from a set of proteins with related function. Protein Eng, 7(2), 165-71.
Saurin, W. and Marliere, P. (1987). Matching relational patterns in nucleic acid sequences. Comput Appl Biosci, 3 (2), 115-20.
Sibbald, P. R. and Argos, P. (1990). Scrutineer: a computer program that flexibly seeks and describes motifs and profiles in protein sequence databases [published erratum appears in Comput Appl Biosci 6, 431]. Comput Appl Biosci, 6 (3), 279-88.
Smith, H. O., Annau, T. M. and Chandrasegaran, S. (1990). Finding sequence motifs in groups of functionally related proteins. Proc Natl Acad Sci U S A, 87 (2), 826-30.
Smith, R. (1988). A finite state machine algorithm for finding restriction sites and other pattern matching applications. Comput Appl Biosci, 4 (4), 459-65.
Smith, R. F. and Smith, T. F. (1992). Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling. Protein Eng 5 (1), 35-41.
Staden, R. (1991). Screening protein and nucleic acid sequences against libraries of patterns. Dna Seq, 1 (6), 369-74.
Staden, R. (1994). Staden: searching for motifs in nucleic acid sequences. Methods Mol Biol, 25, 93-102.
Staden, R. (1994). Staden: searching for motifs in protein sequences. Methods Mol Biol, 25, 131-9.
Staden, R. (1994). Staden: using patterns to analyze nucleic acid sequences. Methods Mol Biol, 25, 103-11.
Staden, R. (1994). Staden: using patterns to analyze protein sequences. Methods Mol Biol, 25, 141-54.
Sternberg, M. J. (1991). PROMOT: a FORTRAN program to scan protein sequences against a library of known motifs. Comput Appl Biosci, 7 (2), 257-60.
Stormo, G. D. (1990). Consensus patterns in DNA. Methods Enzymol 183 , 211-21.
Valle, G. (1993). Discover 1: a new program to search for unusually represented DNA motifs. Nucleic Acids Res 21 (22), 5152-6.
Bailey, T.L. and Elkan, C. (1995), "Unsupervised learning of multiple motifs in biopolymers using EM", Machine Learning 21, pp. 51&endash;80.
Bairoch, A., Bucher, P. and Hofmann, K. (1997). The PROSITE database, its status in 1997. Nucleic Acids Res, 25(1), 217-21.
Bairoch, A. and Apweiler, R. (1997). The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic Acids Res, 25(1), 31-6.
Barton, G. J. 1990. Protein multiple sequence alignment and flexible pattern matching. Meth Enzymol 183:403&endash;427.]
Bork, P. 1989. Recognition of functional regions in primary structures using a set of property patterns. Febs Letters 257:191&endash;195.
Bork, P., & Gibson, T. J. (1996). Applying motif and profile searches. Methods Enzymol, 266, 162-84.
Bork, P., & Koonin, E. V. (1996). Protein Sequence Motifs. Current Opinion in Structural Biology, 6(3), 366-376.
Brazma, A., Jonassen, I., Ukkonen, E. and Vilo, J. (1996) "Discovering patterns and subfamilies in biosequences," in D.J. States et al. (Eds.) Proc. Fourth International Conference on Intelligent Systems for Molecular Biology, 34&endash;43. Menlo Park, CA: AAAI press.
Dayhoff, M. O., Schwartz, R. M., and Orcutt, B. C. 1978. A model of evolutionary change in proteins. In Atlas of Protein Sequence and Function, Nat. Biomed. Research Foundation, pages 345&endash;352.
Henikoff, S. And Henikoff, J.G. (1991) "Automated assembly of protein blocks for database searching" Nucleic Acids Res., 19, 6565&endash;6572.
Henikoff, S., Greene, E. A., Pietrokovski, s., Bork, P., Attwood, T. K. and Hood, L. (1997). Gene Families: The taxonomy of protein paralogs and chimeras. Science, 278(24 October 1997), 609-614.
Jimenez-Montano, M. A., and Zamora-Cortina, L. 1981. Evolutionary model for the generation of amino acid sequences and its application to the study of mammal alpha-hemoglobin chains. In Proceedings of the Seventh International Biophysics Congress, Mexico City.
Kidera, A., Yonishi, Y., Masahito, O., Ooi, T., and Scheraga, H. A. 1985. Statistical analysis of the physical properties of the twenty naturally occurring amino acids. J Prot Chem 4:23&endash;55.
Livingstone, C. D., & Barton, G. J. (1996). Identification of functional residues and secondary structure from protein multiple sequence alignment. Methods Enzymol, 266, 497-512.
Miyata, T., Miyazawa, S., and Yasunaga, T. 1979. Two types of amino acid substitution in protein evolution. J Mol Evol 12:219&endash;236.
Mocz, G. 1995. Fuzzy cluster analysis of simple physicochemical properties of amino acids for recognizing secondary structure in proteins. Protein Sci 4:1178&endash;1187.
Moore, J. F., Engelberg, A., & Bairoch, A. (1988). Using PC/Gene for protein and nucleic acid analysis. Biotechniques, 6, 566-572.
Patthy, L. (1987). Detecting homology of distantly related proteins with consensus sequences. J. Mol. Biol., 198, 567-577.
Patthy, L. (1991). Modular Exchange Principles in Proteins. Current Opinion in Structural Biology, 1, 351-361.
Patthy, L. (1996). Consensus Approaches in Detection of Distant Homologies. Methods in Enzymology, 266, 184-197.
Poch, O., & Delarue, M. (1996). Converting Sequence Block Alignments into Structural Insights. Methods in Enzymology, 266, 662-680.
Posfai, J., Bhagwat, A. S., Posfai, G. and Roberts, R. J. (1989). Predictive motifs derived from cytosine methyltransferases. Nucleic Acids Res 17 (7), 2421-35.
Sander, C. and Schneider, R. 1991. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins: Structure, Function, and Genetics 9:56&endash;68.
Saqi, M. A. S. and Sternberg, M. J. E. (1994). Identification of sequence motifs from a set of proteins with related function. Protein Engineering 7 (2), 165-171.
Saqi, M. A. and Sayle, R. (1994). PdbMotif--a tool for the automatic identification and display of motifs in protein structures. Comput Appl Biosci, 10(5), 545-6.
Shannon, C.E. (1948) "A mathematical theory of communication," Bell System Technical Journal, 27, 398-403.
Smith, H. O., Annau, T. M. and Chandrasegaran, S. (1990). Finding sequence motifs in groups of functionally related proteins. Proc Natl Acad Sci U S A, 87(2), 826-30.
Staden, R. (1988). Methods to define and locate patterns of motifs in sequences. Comput Appl Biosci, 4 (1), 53-60.
Smith, R. F. and Smith, T. F. Automatic generation of primary sequence patterns from sets of related protein sequences. Proc Natl Acad Sci USA 87:118&endash;122, 1990.
Taylor, W. R. The classification of amino acid conservation. J Theor Biol 119:205&endash;218, 1986.
Tatusov, R. L., Altschul, S. F. and Koonin, E. V. (1994). Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc Natl Acad Sci U S A 91 (25), 12091-5.
Tatusov, R. L., Koonin, E. V. and Lipman, D. J. (1997). A Genomic Perspective of Protein Families. Science, 278(24 October), 631.
Taylor, W. R. (1986). Identification of protein sequence homology by consensus template alignment. J. Mol. Biol., 188, 233-258.
Thornton, J. M. and Gardner, S. P. (1989). Protein motifs and data-base searching. Trends Biochem Sci, 14 (7), 300-4.
Witten, I.H., Neal, R., and Cleary, J.G. (1987) "Arithmetic coding for data compression" Communications of the Association for Computing Machinery, 30 (6) 520-540, June. Reprinted in C Gazette 2 (3) 4-25, December, 1987.
Wu, T. D., and Brutlag, D. L. 1995. Identification of protein motifs using conserved amino acid properties and partitioning techniques. ISMB&endash;95, pages 402&endash;410.
Wu, T. D., & Brutlag, D. L. (1996). Discovering Empirically Conserved Amino Acid Substitution Groups in Databases of Protein Families. ISMB-96, 3, 230-240.
3.4 Scoring matrices, scoring systems
11references.html

Allison, L. (1993). Normalization of affine gap costs used in optimal sequence alignment. J Theor Biol 161 (2), 263-9.
Altschul, S. F. (1989). Gap costs for multiple sequence alignment. J Theor Biol, 138(3), 297-309
Altschul, S. F. (1991). Amino acid substitution matrices from an information theoretic perspective. J Mol Biol, 219(3), 555-65.
Altschul, S. F. (1993). A protein alignment scoring system sensitive at all evolutionary distances. J Mol Evol 36 (3), 290-300.
Brendel, V., Bucher, P., Nourbakhsh, I. R., Blaisdell, B. E., & Karlin, S. (1992). Methods and algorithms for statistical analysis of protein sequences. Proc Natl Acad Sci U S A, 89(6), 2002-6.
Benner, S. A., Cohen, M. A. and Gonnet, G. H. (1993). Empirical and structural models for insertions and deletions in the divergent evolution of proteins. J Mol Biol 229 (4), 1065-82.
Brendel, V. (1996). Statistical analysis of protein sequences. In H. Villar (Ed.), Advances in Computational Biology, (Vol. 2, pp. 121-160). Greenwich, CT.: JAI Press.
Brendel, V., & Karlin, S. (1994). Applications of statistical criteria in protein sequence analysis: case study of yeast RNA polymerase II subunits. Comput Chem, 18(3), 251-3.
Brutlag, D. L., Dautricourt, J. P., Maulik, S. and Relph, J. (1990). Improved sensitivity of biological sequence database searches. Comput Appl Biosci 6 (3), 237-45.
Collins, J. F., Coulson, A. F., & Lyall, A. (1988). The significance of protein sequence similarities. Comput Appl Biosci, 4(1), 67-71.
Dayhoff, M. Schwartz, R. M. and Orcutt, B. C. (1978). A model of evolutionary change in Proteins. Atlas of Protein Structure 1978, 345-352
Gonnet, G. H., Cohen, M. A. and Benner, S. A. (1992). Exhaustive Matching of the Entire Protein Sequence Database. Science 256 (5062), 1443-5.
Henikoff, S. (1996). Scores for Sequence Searches. Current Opinion in Structural Biology 6 (3), 353-360.
Henikoff, S., & Henikoff, J. G. (1992). Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A, 89(22), 10915-9.
Henikoff, S., & Henikoff, J. G. (1993). Performance evaluation of amino acid substitution matrices. Proteins, 17(1), 49-61.
Johnson, M. S., Overington, J. P. and Blundell, T. L. (1993). Alignment and searching for common protein folds using a data bank of structural templates. J Mol Biol 231 (3), 735-52.
Jones, D. T., Taylor, W. R. and Thornton, J. M. (1992). The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci, 8 (3), 275-82.
Karlin, S. (1994). Statistical studies of biomolecular sequences: score-based methods. Philos Trans R Soc Lond B Biol Sci, 344(1310), 391-402.
Karlin, S., & Altschul, S. F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A, 87(6), 2264-8.
Karlin, S., & Altschul, S. F. (1993). Applications and statistics for multiple high-scoring segments in molecular sequences. Proc Natl Acad Sci U S A, 90(12), 5873-7.
Karlin, S., & Brendel, V. (1992). Chance and statistical significance in protein and DNA sequence analysis. Science, 257(5066), 39-49.
Luthy, R., McLachlan, A. D. and Eisenberg, D. (1991). Secondary structure-based profiles: use of structure-conserving scoring tables in searching protein sequence databases for structural similarities. Proteins 10 (3), 229-239.
Overington, J., Donnelly, D., Johnson, M. S., Sali, A. and Blundell, T. L. (1992). Environment-specific amino acid substitution tables: tertiary templates and prediction of protein folds. Protein Sci 1 (2), 216-26.
Pearson, W. R. (1995). Comparison of methods for searching protein sequence databases. Protein Sci, 4(6), 1145-60.
Schwartz, R. M. and Dayhoff, M. O. (1979). Matrices for Detecting Distant Relationships. Atlas of Protein Structure 5 (Suppl. 3), 353-358.
Vingron, M. (1996). Near-Optimal Sequence Alignment. Current Opinion in Structural Biology, 6(3), 346-252.
Vogt, G., Etzold, T., & Argos, P. (1995). An assessment of amino acid exhange matrices in aligning protein sequences: the twilight zone revisited. J. Mol. Biol., 249, 816-831.
Waterman, M. S., & Vingron, M. (1994). Rapid and accurate estimates of statistical significance for sequence data base searches. Proc. Natl. Acad. Sci. USA, 91, 4625-4628.
Wilbur, W. J. (1985). On the PAM matrix model of protein evolution. . Mol Biol Evol 2 (5), 434-47.
Zhu, Z. Y., Sali, A. and Blundell, T. L. (1992). A variable gap penalty function and feature weights for protein 3-D structure comparisons. Protein Eng 5 (1), 43-51.

3.5 Sequence alignment


Allison, L., Wallace, C. S. and Yee, C. N. (1992). Finite-state models in the alignment of macromolecules. J Mol Evol, 35 (1), 77-89.
Cantalloube, H., Labesse, G., Chomilier, J., Nahum, C., Cho, Y. Y., Chams, V., Achour, A., Lachgar, A., Mbika, J. P., Issing, W. and et al. (1995). Automat and BLAST: comparison of two protein sequence similarity search programs. Comput Appl Biosci 11 (3), 261-72.
Chao, K. M., Zhang, J., Ostell, J. and Miller, W. (1995). A local alignment tool for very long DNA sequences. Comput Appl Biosci 11 (2), 147-53.
Dayhoff, M. Schwartz, R. M. and Orcutt, B. C. (1978). A model of evolutionary change in Proteins. Atlas of Protein Structure 1978, 345-352
Dayhoff, M. O., Barker, W. C. and Hunt, L. T. (1983). Establishing Homologies in Protein Sequences, in Methods in Enzymology, 91, 524-545.
DeLisi, C. and Kanehisa, M. (1984). Assessing the Significance of Local Sequence Homologies. Mathematical Biosciences 69, 77-85.
Doolittle, R. and Fairchild. (1981). Similar amino acid sequences: chance or common ancestry? Science 214, 149-158.
Doolittle, R. F. (1986). Of Urfs and Orfs: A Primer on How to Analyze Derived Amino Acid Sequences. Mill Valley, California: University Science Books.
Feng, D.F., Johnson, M.S. and Doolittle, R.F. (1985). Aligning amino acid sequences: comparison of commonly used methods. J. Mol. Evol. 21, 112-125.
Gribskov, M. (1994). Profile analysis. Methods Mol Biol 25 , 247-66.
Grice, J. A., Hughey, R. and Speck, D. (1995). Parallel sequence alignment in limited space. Ismb 3 , 145-53.
Krogh, A., Brown, M., Mian, I. S., Sjolander, K. and Haussler, D. (1994). Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol 235 (5), 1501-31.
Huang, X. Q., Hardison, R. C. and Miller, W. (1990). A space-efficient algorithm for local similarities. Comput Appl Biosci 1990 6(4), 373-81.
Landes, C. and Risler, J. L. (1994). Fast databank searching with a reduced amino-acid alphabet. Comput Appl Biosci 10 (4), 453-4.
Lawrence, C. E. and Reilly, A. A. (1990). An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins, 7 (1), 41-51.
Needleman, S. B. and Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443-453.
Pearson, W. R. and Miller, W. (1992). Dynamic programming algorithms for biological sequence comparison. Methods Enzymol, 210, 575-601.
Pearson, W. R. (1991). Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics, 11 (3), 635-50.
Pearson, W. R. (1995). Comparison of methods for searching protein sequence databases. Protein Sci 4 (6), 1145-60.
Rechid, R., Vingron, M. and Argos, P. (1989). A new interactive protein sequence alignment program and comparison of its results with widely used algorithms. Comput Appl Biosci, 5 (2), 107-13.
Resenchuk, S. M. and Blinov, V. M. (1995). ALIGNMENT SERVICE: creation and processing of alignments of sequences of unlimited length. Comput Appl Biosci 11 (1), 7-11.
Reeck, G. R., de Haen, C., Teller, D. C., Doolittle, R. F., Fitch, W. M., Dickerson, R. E (1987). "Homology" in Proteins andNucleic Acids: A Terminology Muddle and a Way out of It. Cell 50, 667.
Searls, D. B. and Murphy, K. P. (1995). Automata-theoretic models of mutation and alignment. Ismb 3 , 341-9.
Smith, T. F. and Waterman, M. (1981). Identification of common molecular subsequences. J. Mol. Biol. 147, 195-197.
Smith, T., Waterman, M. and Fitch, W. (1981). Comparative biosequence metrics. J. Mol. Evol. 18, 38-46.
Streletc, V. B., Shindyalov, I. N., Kolchanov, N. A. and Milanesi, L. (1992). Fast, statistically based alignment of amino acid sequences on the base of diagonal fragments of DOT-matrices. Comput Appl Biosci, 8 (6), 529-34.
Waterman, M. S., Eggert, M. and Lander, E. (1992). Parametric sequence comparisons. Proc Natl Acad Sci U S A, 89 (13), 6090-3.
 
Scoring Systems
Allison, L. (1993). Normalization of affine gap costs used in optimal sequence alignment. J Theor Biol 161 (2), 263-9.
Altschul, S. F. (1993). A protein alignment scoring system sensitive at all evolutionary distances. J Mol Evol 36 (3), 290-300.
Benner, S. A., Cohen, M. A. and Gonnet, G. H. (1993). Empirical and structural models for insertions and deletions in the divergent evolution of proteins. J Mol Biol 229 (4), 1065-82.
Brutlag, D. L., Dautricourt, J. P., Maulik, S. and Relph, J. (1990). Improved sensitivity of biological sequence database searches. Comput Appl Biosci 6 (3), 237-45.
Gonnet, G. H., Cohen, M. A. and Benner, S. A. (1992). Exhaustive Matching of the Entire Protein Sequence Database. Science 256 (5062), 1443-5.
Henikoff, S. (1996). Scores for Sequence Searches. Current Opinion in Structural Biology 6 (3), 353-360.
Johnson, M. S., Overington, J. P. and Blundell, T. L. (1993). Alignment and searching for common protein folds using a data bank of structural templates. J Mol Biol 231 (3), 735-52.
Jones, D. T., Taylor, W. R. and Thornton, J. M. (1992). The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci, 8 (3), 275-82.
Luthy, R., McLachlan, A. D. and Eisenberg, D. (1991). Secondary structure-based profiles: use of structure-conserving scoring tables in searching protein sequence databases for structural similarities. Proteins 10 (3), 229-239.
Overington, J., Donnelly, D., Johnson, M. S., Sali, A. and Blundell, T. L. (1992). Environment-specific amino acid substitution tables: tertiary templates and prediction of protein folds. Protein Sci 1 (2), 216-26.
Schwartz, R. M. and Dayhoff, M. O. (1979). Matrices for Detecting Distant Relationships. Atlas of Protein Structure 5 (Suppl. 3), 353-358.
Wilbur, W. J. (1985). On the PAM matrix model of protein evolution. . Mol Biol Evol 2 (5), 434-47.
Zhu, Z. Y., Sali, A. and Blundell, T. L. (1992). A variable gap penalty function and feature weights for protein 3-D structure comparisons. Protein Eng 5 (1), 43-51.
 
Aligning Sequences to Structures
Bryant, S. H. and Altschul, S. F. (1995). Statistics of sequence-structure threading. Curr Opin Struct Biol 5 (2), 236-44.
Casari, G., Sander, C. and Valencia, A. (1995). A method to predict functional residues in proteins. Nat Struct Biol 2 (2), 171-8.
Diederichs, K. (1995). Structural superposition of proteins with unknown alignment and detection of topological similarity using a six-dimensional search algorithm. Proteins 23 (2), 187-95.
Fischer, D., Rice, D., Bowie, J. U. and Eisenberg, D. (1996). Assigning amino acid sequences to 3-dimensional protein folds. Faseb J 10 (1), 126-36.
Godzik, A. and Skolnick, J. (1994). Flexible algorithm for direct multiple alignment of protein structures and sequences. Comput Appl Biosci 10 (6), 587-96
Holm, L. and Sander, C. (1993). Protein structure comparison by alignment of distance matrices. J Mol Biol 233 (1), 123-38.
Holm, L. and Sander, C. (1996). The FSSP database: fold Classification based on structure-structure alignment of proteins. Nucleic Acids Res. 24 (1), 206-209.
Lathrop, R. H. and Smith, T. F. (1996). Global optimum protein threading with gapped alignment and empirical pair score functions. J Mol Biol 255 (4), 641-65.
Miller, R. T., Jones, D. T. and Thornton, J. M. (1996). Protein fold recognition by sequence threading: tools and assessment techniques. Faseb J 10 (1), 171-8.
Rost, B. and Sander, C. (1994). Structure prediction of proteins--where are we now? Curr Opin Biotechnol 5 (4), 372-80.
Rost, B. (1995). TOPITS: threading one-dimensional predictions into three-dimensional structures. Ismb 3 , 314-21.
Sayle, R., Saqi, M., Weir, M. and Lyall, A. (1995). PdbAlign, PdbDist and DistAlign: tools to aid in relating sequence variability to structure. Comput Appl Biosci 11 (5), 571-3.
Schneider, R. and Sander, C. (1996). The HSSP database of protein structure-sequence alignments. Nucleic Acids Res. 24 (1), 201-205.
Wilmanns, M. and Eisenberg, D. (1995). Inverse protein folding by the residue pair preference profile method: estimating the correctness of alignments of structurally compatible sequences. Protein Eng 8 (7), 627-39.

3.6 Fast database searching


Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990). A Basic Local Alignment Search Tool. J. Mol. Biol., 215, 403-410.
Altschul, S. F., Boguski, M. S., Gish, W. and Wootton, J. C. (1994). Issues in searching molecular sequence databases. Nat Genet 6 (2), 119-29.
Barsalou, T. and Brutlag, D. L. (1991). Searching Gene and Protein Sequence Databases. MD Computing, 8(3), 144-149.
Brutlag, D. L., Dautricourt, J. P., Maulik, S. and Relph, J. (1990). Improved sensitivity of biological sequence database searches. Comput Appl Biosci, 6(3), 237-45.
Brutlag, D. L., Dautricourt, J. P., Diaz, R., Fier, J., Moxon, B. and Stamm, R. (1993). BLAZE: An implementation of the Smith-Waterman Comparison Algorithm on a Massively Parallel Computer. Computers and Chemistry 17 , 203-207.
Collins, J. F., & Coulson, A. F. (1984). Applications of parallel processing algorithms for DNA sequence analysis. Nucleic Acids Res, 12, 181-192.
Collins, J. F., Coulson, A.F. W. and Lyall, A. (1988). The significance of protein sequence similarities. CABIOS 4, 67-71.
Galper, A. R. and Brutlag, D. L. (1990). Parallel Similarity Search and Alignment with the Dynamic Programming Method (KSL Report 90-74). Stanford University.
Gish, W. and States, D. J. (1993). Identification of protein coding regions by database similarity search. Nat Genet 3 (3), 266-72.
Gonnet, G. H., Cohen, M. A. and Benner, S. A. (1992). Exhaustive matching of the entire protein sequence database. Science, 256, 1443-5.
Gribskov, M., McLachlan, A. D. and Eisenberg, D. (1987). Profile analysis: Dectection of distantly related proteins. Proc. Natl. Acad. Sci. USA 84, 4355-4358.
Lipman, D.J. and Pearson, W.R. (1985). Rapid and Sensitive Protein Simlarity Searches. Science 227, 1435-1441.
Liuni, S., Prunella, N., Pesole, G., D'Orazio, T., Stella, E. and Distante, A. (1993). SIMD parallelization of the WORDUP algorithm for detecting statistically significant patterns in DNA sequences. Comput Appl Biosci 9 (6), 701-7.
Myers E. W. and Miller, W. (1988). Optimal alignments in linear space. CABIOS 4, 11-17.
Pearson, W. R. and Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci USA 85, 2444-2448.
Pearson, W. J. (1986). Sensitivity and Selectivity in Protein Sequence Comparison. In Methods in Protein Sequence Analysis, Clifton, New Jersey: Humana Press.
Pearson, W. R. (1994). Using the FASTA program to search protein and DNA sequence databases. Methods Mol Biol 25 , 365-89.
Pesole, G., Prunella, N., Liuni, S., Attimonelli, M. and Saccone, C. (1992). WORDUP: an efficient algorithm for discovering statistically significant patterns in DNA sequences. Nucleic Acids Res 20 (11), 2871-5.
Staden, R. (1994). Staden: comparing sequences. Methods Mol Biol 25 , 155-70.
Strelets, V. B., Ptitsyn, A. A., Milanesi, L. and Lim, H. A. (1994). Data bank homology search algorithm with linear computation complexity. Comput Appl Biosci 10 (3), 319-22.
Wilbur, W.J. and Lipman, D.J. (1983). Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci. USA 80, 726-30.

3.7 Multiple sequence alignment


Carillo, H. and Lipman, D. (1988). SIAM J. Appl. Math., 48, 1073-1082.
Corpet, F. (1988). Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res., 16, 10881-10890.
Dolz, R. (1994). GCG: production of multiple sequence alignment. Methods Mol Biol 24 , 83-99.
Eddy, S. R. (1995). Multiple alignment using hidden Markov models. Ismb, 3, 114-20.
Eisen, J. A. (1997). The Genetic Data Environment. A user modifiable and expandable multiple sequence analysis package. Methods Mol Biol, 70, 13-38.
Feng, D. F., and Doolittle, R. F. (1987). Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25, 351-360.
Feng, D. F. and Doolittle, R. F. (1996). Progressive Alignment of Amino Acid Sequences and Construction of Phylogenetic Trees from Them. Methods in Enzymology, 266, 368-382.
Galas, D.J., Eggert, M. and Waterman, M.S. (1985). Rigorous pattern-recognition methods for DNA sequences. Analysis of promoter sequences from Escherichia coli. J. Mol. Biol. 186, 117-128.
Gotoh, O. (1993). Optimal alignment between groups of sequences and its application to multiple sequence alignment. Comput Appl Biosci 9 (3), 361-70.
Gotoh, O. (1996). Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J Mol Biol, 264(4), 823-38.
Henikoff, S., & Henikoff, J. G. (1997). Embedding strategies for effective use of information from multiple sequence alignments. Protein Sci, 6(3), 698-705.
Higgins, D. G. and Sharp, P. M. (1988). CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene, 73(1), 237-44.
Higgins, D. G. and Sharp, P. M. (1989). Fast and sensitive multiple sequence alignments on a microcomputer. Comput Appl Biosci, 5(2), 151-3.
Higgins, D. G., Bleasby, A. J. and Fuchs, R. (1992). Clustal V: improved software for multiple sequence aligment. CABIOS, 8(2), 189-191.
Higgins, D. G. (1994). CLUSTAL V: multiple alignment of DNA and protein sequences. Methods Mol Biol 25 , 307-18.
Higgins, D. G., Thompson, J. D. and Gibson, T. J. (1996). Using CLUSTAL for Multiple Sequence Alignments. Methods in Enzymology, 266, 383-401.
Hughey, R., & Krogh, A. (1996). Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput Appl Biosci, 12(2), 95-107.
Johnson, M. S., and Doolittle, R. F. (1986). A method for the simultaneous alignment of three or more amino acid sequences. J. Mol. Evol. 23, 267-278.
Karlin, S. and Ghandour,G.(1985). Comparative statistics for DNA and protein sequences: Multiple sequence analysis. Proc. Natl. Acad. Sci. USA 82, 6186-6190.
Karlin, S., Morris, D., Ghandour, G., and Leung, M. Y. (1988). Efficient algorithms for molecular sequence analysis. Proc. Natl. Acad. Sci. U. S. A. 85, 841-845.
Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F. and Wootton, J. C. (1993). Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262 (5131), 208-14.
Lipman, D. J., Altschul, S. F. and Kececioglu, J. D. (1989). A tool for multiple sequence alignment. Proc Natl Acad Sci U S A, 86(12), 4412-5.
Martinez H.M. (1983) An efficient method for finding repeats in molecular sequences. Nucleic Acids Res. 11, 4629-4634.
Martinez, H. M. (1988). A flexible multiple sequence alignment program. Nucleic. Acids. Res. 16, 1683-1691.
Murata, M., Richardson, J. S., and Sussman, J. L. (1985). Simultaneous comparison of three protein sequences. Proc. Natl. Acad. Sci. U. S. A. 82, 3073-3077.
Myers, G., Selznick, S., Zhang, Z., & Miller, W. (1996). Progressive multiple alignment with constraints. J Comput Biol, 3(4), 563-72.
Russell, R. B. and Barton, G. J. (1992). Multiple protein sequence alignment from tertiary structure comparison: assignment of global and residue confidence levels. Proteins, 14(2), 309-23.
Sobel, E., and Martinez, H. M. (1986). A multiple sequence alignment program. Nucleic. Acids. Res. 14, 363-374.
Subbiah, S. and Harrison, S. C. (1989). A method for multiple sequence alignment with gaps. J Mol Biol, 209(4), 539-48.
Taylor, W. R. (1986). Identification of protein sequence homology by consensus template alignment. J. Mol. Biol. 188, 233-258.
Taylor, W. R. (1987). Multiple sequence alignment by a pairwise algorithm. Comput. Appl. Biosci. 3, 81-87.
Taylor, W. R. (1996). Multiple Protein Sequence Alignment: Algorithms and Gap Insertion. Methods in Enzymology, 266, 343-367.
Thompson, J. D., Higgins, D. G. and Gibson, T. J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22 (22), 4673-80.
Vingron, M. and Argos, P. (1989). A fast and sensitive multiple sequence alignment algorithm. Comput Appl Biosci, 5(2), 115-21.
Vingron, M., & Sibbald, P. R. (1993). Weighting in sequence space: a comparison of methods in terms of generalized sequences. Proc Natl Acad Sci U S A, 90(19), 8777-81.
Waterman, M., Arratia, R. and Galas, D.J. (1984). Pattern Recognition in Several Sequences: Consensus and Alignment. Bull. Math. Biol. 46, 515-527.
Waterman, M. S. (1986). Multiple sequence alignment by consensus. Nucleic. Acids. Res. 14, 9095-9102.

3.8 Protein domain collections

Hegyi H, Pongor S: Predicting potential domain homologies from FASTA search results. Comput Appl Biosci 1993 Jun;9(3):371-372
Fabian P, Murvai J, Hatsagi Z, Vlahovicek K, Hegyi H, Pongor S: The SBASE protein domain library, release 5.0: a collection of annotated protein sequence segments. Nucleic Acids Res 1997 Jan 1;25(1):240-243
Pongor S, Skerl V, Cserzo M, Hatsagi Z, Simon G, Bevilacqua V The SBASE domain library: a collection of annotated protein segments.Protein Eng 1993 Jun;6(4):391-395
Sonnhammer EL, Eddy SR, Durbin R Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 1997 Jul;28(3):405-420
Sonnhammer EL, Kahn D Modular arrangement of proteins as inferred from analysis of homology. Protein Sci 1994 Mar;3(3):482-492
Sonnhammer EL, Durbin R A workbench for large-scale sequence homology analysis. Comput Appl Biosci 1994 Jun;10(3):301-307