1) On the following tree, identify the total number of gene duplication events and label the nodes that correspond to gene duplication events. Assuming both the human and gorilla genomes are complete and the entire gene set is known, what is the minimum number of gene loss events that can explain this tree (3p)? human 1--| |--| human 2--| | |--| human 3-----| | |--| gorilla 1--| | | |---| |-- gorilla 2--| | | human 4-----------| 2) Varför fungerar vanliga sekundärstruktursprediktionsprogram dåligt för integrala membranproteiner? (3p) 4) Give two examples of differences in the organisation of genes in the genome of a eukaryote compared to a prokaryote. (3p) 5) One step in identifying the location of a gene in a genomic sequence is to find an Open Reading Frame (ORF). What is an ORF? Give two reasons why it is usually necessary to use additional criteria to identify a gene in a eukaryotic genome sequence. (3p) 6) Assume that you have been given a nucleotide sequence about which very little is known. Which of the following databases would you use to search for information about it, and why (3p)? a) EMBL b) SwissProt c) Protein Data Bank (PDB) 7) How is PSI-BLAST used in protein structure prediction and modelling methods? Describe one important situation where PSI-BLAST cannot help you. (3p) 8) Why are E-values usefull. What do they desribe ? How are they calculated in blast and in fasta ? (3p) 9) What is the differnce in the dynamic programming algorithm between local and global alignments ? (3p) 10) Why are mostly local alignments used for database searches ? (3p)