Lab 2, April 11

Exercises are due April 18

This week's lab concentrates on several things - doing the ELVIS exercise; reading the rest of the first Perl notes, already distributed, and understanding the examples in it; doing Perl exercises at the end of this document; Using parametric alignment to get a feel for alignment models and parameters.

What you need to turn in are your answers to all the exercises detailed in this document, and the Perl exercises 1.9 to the end of the first Perl notes.

Outline for In-Lab Discussion #2

Elvis Lives Example

Here is an interesting aside. It is reported that both the words "elvis" and "lives" each appear as part of several protein sequences held in several protein databases (SwissProt for example). Both of them appear multiple times, but they never appear together. We want to know if this is true.

To answer this question you would want to scan the sequence content of a protein database. Unfortunately Entrez does not allow you to answer this question (later we will use BLAST which can be used for related kinds of searches but doesn't work for this one). However there is a web/database tool that will work for this and will be useful for several other tasks in the class.

Myhits

Exercise: Use Myhits to find how many times ELVIS appears in the protein database SWISS-PROT, how many times LIVES appears, and how many time they appear consecutively. Do ELVIS and DEAD appear together? How many times does PERL appear? Does PERLISGREAT appear? What would it mean if any of these longer statements did appear in the the protein files?

To use my Myhits for this exercise, click on Query (the word), then in the menu choose Pattern Search. Enter E-L-V-I-S in the Pattern Input window (unfortunately you do need to put the dashes between successive characters). Choose the database Swiss-Prot and hit search. The search can a take a minute or so, but should bring up a Results page listing the number of matches and the details of each match.
For amusement, read The Scientist article (listed in the papers list) about ELIVIS searching. How do the results reported there compare to the results you just obtained?.

You might also want to determine if your name in the database?

Sequence alignments with scoring matrices

Parametric Alignment

XPARAL

What you need to know about XPARAL will be discussed in the lab. But you can also read about XPARAL and parametric alignment at:

From the link above, you go to a page that has a link to DOCUMENTATION, and the documentation contains an introduction to parametric alignment, a discussion of the program controls for XPARAL, and a tutorial. Be sure to go through those, including the tutorial if you do not understand from the lab how to use XPARAL. You can also view the video lecture 4, which discusses XPARAL, but it is not the clearest discussion of how to use XPARAL.

You need the program XPARAL to do these exercises. It is installed on the Kemper Hall csif machines. You can access and use XPARAL via the hutchison machines by using Putty to loginto the csif machines, and using Xming to be able to display X-based software. XPARAL is X-based. If you have Putty and Xming on your home machine, you can use XPARAL that way, or you can go to Kemper Hall and use a csif machine in the basement of Kemper. For the hardy, and if you have linux at home: There is a redhat linux version of XPARAL that you can download to your home linux system, but it is a bit old and may have some problems. To download, go to Software.

 

XPARAL Exercise: Recreating the classic Fitch-Smith experiment of 1983

This is from the paper "Optimal sequence alignments", PNAS Vol. 80, pp 1382-1386, March 1983.

They examined parts of the chicken alpha hemoglobin and chicken beta hemoglobin. They believed that the correct alignment of these strings is:

FASFGNLSSPTAILGNPMV
FPHF-DLSH-----GSAQI

Exercise: Type or paste the sequences (removing the spaces) into the two XPARAL sequence windows. Choose global alignment with no scoring matrix and the first opt setting, i.e. without a gap term in the objective function. Find all the polygons. Examine each polygon and use the Next, Prev keys to scan the co-optimal alignments. How many are there? Do any match the one that Fitch and Smith believed correct? Now change the opt function to the fourth choice (the first choice with the GAP term) and rerun the analysis. What do you see now in terms of the number of polygons, whether any contain the correct alignment, the size of that polygon (if there is one), and the number of co-optimal alignments in that polygon? Why is the number of co-optimals, and the size of the polygon important?

Now we go a step beyond the classic experiment: Choose the PAM250+8 scoring matrix and get the polygons. Examine the result and state your conclusions about good alignment models. To specify the scoring matrix, open the menu under Scoring Schemes, and choose PAM250+8. This then brings up a window where W_PAM250P8 is written (I think). Unfortunately, you now need to move the cursor to just before the W, and type in /usr/local/t/ so that what is on that line is /usr/local/t/W_PAM250P8 -- then press OK and if you do not get an error message, the alignments will be done using the PAM250+8 scoring matrix. We will talk about the use of gaps and scoring matrices in lecture on Tuesday.

If you really have a lot of time and energy, check out the paper. You will see that they also did an analysis of the underlying mRNA sequences. Try to recreate that analysis. I don't expect anyone to really do this.

 

More Perl

Exercise: Try out countbadly.pl and countgoodly.pl and find the largest values of n for which they can respectively compute T(n)? Try to explain why there is a difference in their performance. These programs are also available from the announcements page.

  • countbadly.pl
  • countgoodly.pl
  • Even though we have not explained for-loops in Perl, you should be able to understand what the loop structure and syntax is in the program countgoodly.pl. If you have questions, consult the appendix section on loops. Notice that there are no declarations needed in Perl. What the program reads into $n is either treated as a string (for example, if you want to
    change all 0's to 1's in the value of $n using function tr//), or as an integer (if you want to do arithmetic using $n as we do in this program). Perl just figures out how to handle $n by context - ain't it great! Notice also that I have added -w in the top line. That tells the interpreter to give more understandable messages in case of errors or warnings. It's a good thing to use.

    PERL Exercises from the Perl notes

    Read the remaining part of the first Perl notes, and do the exercises starting with 1.9 until the end of the notes.