Let's say we obtained the highest similarity score s by comparing a query sequence with one member of a sequence database. To determine the matemathical significance, we have to calculate the similarity score between the query and all members of the database, and to determine the averageand the standard deviationof the scores.

The measure of mathematical significance is the so-called Student t value, that allows to compare a single value with a database mean value calculated from n entries:


This quantity expresses the distance between a score from the average, in units of standard deviation. The higher the value, the higher the significance. The significance is characterized by a value taken from the Student table for a given t and a given sample number (the number of the database members). The table contains probability values that are referred to as "significance levels" . A value of 0.005 (quoted in writing often as p<0.005) means that the probability to find score s by chance is smaller than 0.5%. "Unique" sequences may have much smaller probabilities, and the probabilities are automatically calculated by most database searching programs. It is worth mentioning that the Student test assumes that the distribution of scores is random (Gaussian), which is only approximately true. The test was developed by an Englishman, William Gosset, who published under the pseudonym Student in the 19th century.