Regular Expression exercises:
Read the rest of the Second Perl Notes, and do excercises 2.12, 2.13, 2.14.
Read sections 4.3 and 4.4 in Johnson, or some other Perl book, read about arrays if you
haven't already done so.
Dynamic Programming Exercises
- Starting from the original needleman.pl program,
follow the lecture notes on the Smith-Waterman local alignment
algorithm to modify the original needleman.pl so that it now
computes the optimal LOCAL alignment VALUE for two input
strings, using 1 for a match, and -1 for each mismatch and space, in the alignment
of the substrings used in the optimal local alignment. Call that version of the program sw.pl
This should only
require a handful of small changes. The major one is that you
have to keep track of the maximum cell value as you fill in the
table, and then report that one at the end, instead of reporting
V[n][m]. Cat the program, and run it on the strings
attaacggt and agaagga. Script the result. Find a traceback by hand using the
result of the program. You can check your result using XPARAL (how?).
- Now modify your local alignment program (or start from the modified needleman.pl you did in
an earlier lab)
so that the modified program asks the user for a match value, and for mismatch and space penalties.
Once that program is developed, you can use it to compute the length of the longest common substring between
two strings, by selecting an appropriate match value and mismatch and space penalities. State
one appropriate combination of parameters. Then use your program with those parameters to find
the length of the longest common substring between attcacgactggta and tatacgacgacttacaggct.
Introduction to using BLAST
Blast:
First, you will read and work your way through some of NCBI's extensive materials
on Blast. Go to the NCBI homepage
Note: Finding all the NCBI material may take a little trial and error because there are
many redundant materials, they keep
reorganizing them and they don't always update their links or make the materials agree. And they
have recently introduced a new GUI interface for BLAST. But, after reading some basic materials
about BLAST, you will get the idea and be able to use it even if it doesn't look exactly
the way it does in the materials. In navagating NCBI web materials, the best advice is to
have patience, and explore. Below is a scenario that recently worked:
From the NCBI homepage, click on
Education, towards the bottom of the list of links on the left; that should bring you to a page where you
can select an icon at the bottom for Blast Information. There read the Query Tutorial. Be sure to read
the section on FAST format (trivial but important).
Then start the BLAST tutorial. This is long and somewhat confusing. But read enough (look at details
also) so that you can do an actual BLAST query using the Methanococcus sequence shown there (or use
the GI number or the accession number). That is, in a real BLAST search window (open one) specify
that sequence and specify the parameters suggested in the tutorial. Then do the search and
examine the output.
There used to be somewhere in the tutorials a page on
Deciphering Blast output. I don't know where it is now - try to find it. But anyway, look at the
output you get and
Report the accession number of the eighth highest scoring hit obtained. What is the S score and E
value for that hit?
Click on the link to this protein to get
more details. What is its genbank accession number, and
who is the lead author on the cited paper?
As for the ideas underlying Blast, you can scan the first part of the
following set of notes. We will discuss these in class, so this is optional
for now.
Rapid Similarity Searching (BLAST)
However, the parts of the notes on how to
*use* Blast are somewhat old, and the NCBI blast tutorials are more
current.