Neural Networks

With the increased amount of sequence data produced it has been even more important to increase the sensitivity of a given method. From studies of different biological sequence signals it is clear that these signals often are less exact than can be easily formulated in a prosite pattern. For instance the patterns of signal peptides (that determines if a protein is secreted or not) varies quite a lot. Until recently the most frequentely used method was based on a profile. However computer science has developed methods that are more powerfull if used in the right way.

Neural networks

One such method is Artifical neural network, but there are also other machine learning approaches that can be used.

One common feature of these methods are that they are data-driven, i.e. you do not have to specify exactly how the methods should work. In contrast to the profile method where you have to specify the exact rules on how the profiles should be constructed. Neural networks are instead trained to get the best possible predictions. Normally the NN is given two (or more) set of sequences (or any other type of data) and are told to distinguish between these.

What to think of

An important thing to remember about protein sequences are that NNs are extremely stupid. They will learn whatever they are told to learn. Therefore it is of the greatest importance to tell them to learn the right thing. This is often called to select the training sets correctly. Another problem is overtraining, i.e. that the NNs learn rules that are specific for the training set but not are generally true. This can be avoided by using a non-related test-set. In biological applications this is often obtained by using homology reductions, i.e. to only include one copy of all related sequences.

One problem with NNs is that they can not easily handle data of variable length, i.e. they are only suitable for patterns that can be seen in a continous window in the sequence. To solve such problems other machine learning methods, such as hidden Markov models are better.

Use of neural networks in biology

NNs has been used in many different applications in biology, including signal sequence prediction, secondary structure prediction and prediction of posttranslational modifications.


Arne Elofsson
Last modified: Thu Oct 18 10:42:05 CEST 2001