Checklist for evaluating the plausibility of weak hits.

Examinations on the hit

    Is the amino acid distribution consistent with globular (cytosolic, extracellular), integral membrane, coiled­coil, fibrous or random­coil structure?

These are mutually exclusive structural classes that should not overlap within a domain (although they can be juxtaposed in multidomain proteins). High scoring random coil is not a good indicator of homology.

    Is there structural information known for query or hit?

The knowledge of the three­dimensional structure greatly facilitates the evaluation as constraints from the hydrophic core, catalysis etc. can be included.

    Is there a partial overlap of the hit with an established domain class in a reciprocal search?

Immediately rules out potential similarity. By definition, globular domains do not overlap (although they can be inserted into loops in other domains).

    Is the full domain potentially present?

Globular structure is stabilised by interactions in the hydrophobic core. Half a globular domain is a meaningless concept and very rarely observed.

    Is there a match to all the conserved alignment blocks?

Conserved blocks usually indicate secondary structural elements.

    Is there a match to all highly conserved hydrophobic residues?

These are essential to the given hydrophobic core. Very few exceptions are tolerated.

    Do most positions that are aligned to unconserved positions in the query have hydrophilic residues?

Surface residues are usually hydrophilic, and are unconserved unless binding other molecules. Multiple mismatched hydrophobic residues are contrary indicators. (Surprisingly frequently, transmembrane regions are erroneously aligned to cytosolic proteins}.

    Has Pro been aligned to a position in a block where it was not seen before?

Pro is favoured in the N­terminal 3 residues of an a­helix Any deeper and it breaks H­bonds. It is allowed on edge strands. It breaks H­bonds on internal strands. Exceptions are rare and cannot be arbitrarily invoked for weak hits.

    Has Gly been aligned to a position in a block where it was not seen before?

The lack of a sidechain reduces helix and strand stability. Gly aligned to small hydrophobic residues (Ala, Val, Cys) may indicate a plausible tight packing arrangement; otherwise, only occasional exceptions may be tolerated.

    Is a segment rich in Gly, Pro, Asn, Ser aligned to a block poor in these residues?

Indicates a loop region is erroneously aligned to a secondary structure element.

    Are the matches to blocks consistent with the block secondary structure?

Secondary structures of matched blocks should be identical. In addition to the above rules, amino acid preferences may be indicative: e.g. aligning a sequence composed of preferring residues like lle, Val, Thr, Ser onto an a­helix would be highly implausible (unless these were already favoured in the aligned sequences).

    Have new insertions/deletions appeared in conserved regions?

Alignment blocks are usually conserved due to structural or functional constraints: therefore large or frequent insertions and deletions are unlikely.

    For Cys­rich sequences, do the Cys patterns match or not?

Number and spacing of Cys residues distinguish between classes of extra­cellular disulphide­rich modules, as well as (often with His) intracellular zinc fingers e.g. the GAL4 example.

    Are the functions of the hits compatible?

On the one hand one should not overinterpret results to fit a tempting functional context; on the other hand, some functional aspects (e.g. query proteins are extracellular, hit is a metabolic enzyme) should be considered.

    Does additional functional or biochemical information provide some clues as for homology?

Already identified catalytic residues, disulfide bridges, mutation data etc. add constraints that can be helpful in excluding false positives.