Lab 3

Exercises are due April 25

This week's lab concentrates on: - Working through a dynamic programming computation for alignment by hand; Modifying the Perl alignment program so that it incorporates user input match and penalty scores; exercises to get you more acquainted with NCBI/GenBank, and ExPasy/SwissProt.

What you need to turn in are your answers to all the exercises either in this document or specified in this document, but actually in a linked document.

Outline for In-Lab Discussion #3

Hand Alignment Computation

Perl program for computing similarity of two sequences.

  • Get the program needleman.pl needleman .pl


    Be sure you have it running, and try to understand how it works. This is a Perl version of the Needleman-Wunsch alignment algorithm that we studied in class, using the DP recurrence relations, instead of an alignment graph. Even though you have not learned all the Perl constructs used in this program, you should be able to understand what the different parts of the program does, based on you prior exposure to some programming language.

  • Perl Exercise 1: Modify the program so that it asks the user for a match value V, a mismatch cost Cm, and an indel cost Im, and reads in these input from the keyboard and assigns them to variables. Modify the program so that it finds the maximum value of any possible alignments of the two input strings, where the objective function is
    V x (number of matches) - Cm x (number of mismatches) - Im x (number of indels).
    
    You really have to understand the recurrences and how they work in order to make this modification in the program. Otherwise, you won't see what and where to modify the code.

    LCS Exercise: By setting V to 1, and Cm and Im to zero, the program will produce the length of the longest common subsequence between the two sequences. That is the alignment that simply maximizes the number of matches that can be obtained, without regard for how many space and mismatches are involved. The LCS between two strings is sometimes taken as a measure of the similarity of two strings. By letting your figures make up "random" strings of length 20 say, and computing the LCS of those two strings, try to find in this way the expected length of the LCS of two random strings. The expected length will also be discussed in class.

    More on Sequence Databases

    Follow the link below to a page that contains links and exercises. Read/scan the documentation on SwissProt and then do Exercises 2.4, 2.5, 2.6, and hand in the solutions.

    Regular Expressions in Perl

    Start reading the second Perl notes, and do exercises 1.1, 2.1 and 2.2 in those notes. Second Perl Notes