This week's lab concentrates on two things - working through the first Perl notes and doing the exercises in it; and doing a few exercises to get you acquainted with NCBI, Entrez, CoffeeBreak, Medline, and Emotif.
The first Perl notes are in Intro To Perl 1 (pdf)
For this class you will need two computer accounts, one for the campus IT machines (for example in Hutchison), and another account for the Computer Science CSIF machines which you will use remotely - or you can use them physically in the basement of Kemper Hall. All of this will be explained in the first Lab in 73 Hutchison on friday April 4.
What we will do during the one hour lab session itself is get you started with your accounts, and get you started on two general tasks.
What you need to turn in are your answers to exercises 1.1, 1.2, 1.3, 1.5, 1.6, 1.7, 1.8 in the first set of Perl notes, and your answers to all the exercises written below.
You will want to use the Unix script tool to hand in your Perl programs. If you don't know what that is look at the following.
What to Hand In: Scripting for Beginners
Each student will receive his or her account information for the course and use it to login to their two accounts. If you need assistance in using Unix check out Norm Matloff's Unix tutorial
Matloffs Tutorial
If you don't already know a Unix text editor check out the link to Norm Matloffs site on Unix (see link above). There is a nice tutorial on the vi editor towards the beginning.
At this point we will just get acquainted with Entrez, PubMed, Mesh and CoffeeBreak.
Quote from NCBI: Established in 1988 as a national resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information - all for the better understanding of molecular processes affecting human health and disease.
NCBI will be one of the major web-based resources we will use in this class. NCBI has several on-line tutorials that are very good for introductions. We will specify a few during the class, but you might just explore there at your leisure. A feature of NCBI that I like for showing how biologists get value from the tools and databases is
What It Is: (Quote from NCBI)
Coffee Break is a collection of short reports on recent biological discoveries. Each report incorporates interactive tutorials that show how bioinformatics tools are used as a part of the research process. Reading the past coffee breaks and keeping up with the new ones as they are posted is a great way to learn about a range of bioinformatics tools, and see how they were applied to specific biological problems.
Exercise: Read the Coffee Break article called: Do brains have a freshness date. In that article there are four genes mentioned. For each one, follow the Entrez link in the Coffee Break article to see what kind of information is available in Entrez about that gene. Three important links found on the right side of the Entrez page are Protein, Nucleotide and HomoloGene. The Protein link leads to a page that gives information about the protein product of that gene, and its full protein sequence. The Nucleotide page leads to other pages that give information about the DNA sequence of that gene and related sequences. You have to keep the name of the gene and protein in mind to get to the most relevant sequence information. Try to find the protein and DNA sequence information for the genes and proteins mentioned in the Coffee Break article. For each relevant sequence found, write down its identification number (for example, a Genbank Accession number), and the number of residues (characters) in that sequence. HomoloGene leads to a page that shows genes and proteins in other species that are related to the human gene mentioned in the article. You may find information there about pairwise similarity scores, or pairwise alignments, or multiple alignments of many sequences, or percent identity information etc. Follow the PubMed link in the Coffee Break article to copy and turn in the Abstract of the Lu et al. article that is cited in the Coffee Break article. From the Coffee Break article, and other articles you find linked to it, what did you learn about how genes and proteins involved in human brain aging are related to genes in other organisms? This is really an open ended question because there are so many connections and links that one could follow.
Entrez is the integrated information retrieval system for the NCBI databases. Using Entrez you can look up literature references, protein and nucleic acid sequences, even protein structures and population statistics. There has been a big push in information retrieval towards integrated systems.
PubMed is a publicly accessible search interface to the MEDLINE database. The MEDLINE database contains abstracts for over 10 million articles mostly related to the life sciences. From Medline it is sometimes possible to access the publisher and download the full article.
Here is an interesting aside. It is reported that both the words "elvis" and "lives" each appear as part of several protein sequences held in several protein databases (SwissProt for example). Both of them appear multiple times, but they never appear together. We want to know if this is true.
To answer this question you would want to scan the sequence content of a protein database. Unfortunately Entrez does not allow you to answer this question (later we will use BLAST which can be used for related kinds of searches but doesn't work for this one). However there is a nifty tool at Stanford called Emotif Scan that will work.
Exercise: Use Emotif Scan to find how many times ELVIS appears in the protein database SWISS-PROT, how many times LIVES appears, and how many time they appear consecutively. Do ELVIS and DEAD appear together? How many times does PERL appear? Does PERLISGREAT appear? What would it mean if any of these longer statements did appear in the the protein files?
To use eMOTIF SCAN, write your string in the box where the program calls for a regular expression. We will study regular expressions in more detail later. For now, the simplest regular expression is a simple string like ELVIS.
For amusement, read The Scientist article (listed in the papers list) about ELIVIS searching. How do the results reported there compare to the results you just obtained?.You might also want to determine if your name in the database?