Lab 5, May 2

Exercises are due May 9

This week's lab concentrates on local alignment in Perl, more Perl for regular expressions, and an introduction to using BLAST.

Perl (and related) Exercises for the week:


Exercises
    Regular Expression exercises: Read the rest of the Second Perl Notes, and do excercises 2.12, 2.13, 2.14. Read sections 4.3 and 4.4 in Johnson, or some other Perl book, read about arrays if you haven't already done so.

    Dynamic Programming Exercises
    1. Starting from the original needleman.pl program, follow the lecture notes on the Smith-Waterman local alignment algorithm to modify the original needleman.pl so that it now computes the optimal LOCAL alignment VALUE for two input strings, using 1 for a match, and -1 for each mismatch and space, in the alignment of the substrings used in the optimal local alignment. Call that version of the program sw.pl This should only require a handful of small changes. The major one is that you have to keep track of the maximum cell value as you fill in the table, and then report that one at the end, instead of reporting V[n][m]. Cat the program, and run it on the strings attaacggt and agaagga. Script the result. Find a traceback by hand using the result of the program. You can check your result using XPARAL (how?).

    2. Now modify your local alignment program (or start from the modified needleman.pl you did in an earlier lab) so that the modified program asks the user for a match value, and for mismatch and space penalties. Once that program is developed, you can use it to compute the length of the longest common substring between two strings, by selecting an appropriate match value and mismatch and space penalities. State one appropriate combination of parameters. Then use your program with those parameters to find the length of the longest common substring between attcacgactggta and tatacgacgacttacaggct.

    Introduction to using BLAST

    Blast:

    First, you will read and work your way through some of NCBI's extensive materials on Blast. Go to the NCBI homepage


    Note: Finding all the NCBI material may take a little trial and error because there are many redundant materials, they keep reorganizing them and they don't always update their links or make the materials agree. And they have recently introduced a new GUI interface for BLAST. But, after reading some basic materials about BLAST, you will get the idea and be able to use it even if it doesn't look exactly the way it does in the materials. In navagating NCBI web materials, the best advice is to have patience, and explore. Below is a scenario that recently worked:

    From the NCBI homepage, click on Education, towards the bottom of the list of links on the left; that should bring you to a page where you can select an icon at the bottom for Blast Information. There read the Query Tutorial. Be sure to read the section on FAST format (trivial but important). Then start the BLAST tutorial. This is long and somewhat confusing. But read enough (look at details also) so that you can do an actual BLAST query using the Methanococcus sequence shown there (or use the GI number or the accession number). That is, in a real BLAST search window (open one) specify that sequence and specify the parameters suggested in the tutorial. Then do the search and examine the output.

    There used to be somewhere in the tutorials a page on Deciphering Blast output. I don't know where it is now - try to find it. But anyway, look at the output you get and Report the accession number of the eighth highest scoring hit obtained. What is the S score and E value for that hit? Click on the link to this protein to get more details. What is its genbank accession number, and who is the lead author on the cited paper?

    As for the ideas underlying Blast, you can scan the first part of the following set of notes. We will discuss these in class, so this is optional for now.

    Rapid Similarity Searching (BLAST)

    However, the parts of the notes on how to *use* Blast are somewhat old, and the NCBI blast tutorials are more current.