The BLAST Report

The BLAST report is divided into a number of different logical sections each explained below. While the example below is the result of a blastp search, results from the other programs would be analogous. BLAST results can be retrieved as text or formatted for your browser in HTML. A flat text file is returned when using the stand-alone version of BLAST or the BLAST e-mail server and may be preferrable if any user implemented post processing is planned.

HEADER

The BLAST report begins with header information that lists the type of program, the version, and a release date. Following that is a reference for the BLAST program, this is what should be cited if search results are published.

Finally the full definition field for the query sequence, and a summary of the database searched is shown.

BLASTP 2.0.10 [Aug-26-1999]

Reference:
Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schäffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= gi|4885477|ref|NP_005359.1|pMB| myoglobin
         (154 letters)

Database: Non-redundant SwissProt sequences
           82,258 sequences; 29,652,561 total letters

If you have any problems or questions with the results of this search
please refer to the BLAST FAQs

ONE LINE DESCRIPTIONS

The second section of the report consists of the one-line descriptions of top database matches. These one line descriptions include a database sequence identifier, definition line, as well as the normalized score S' and the statistical significance 'E value' for the match

The first match, in this case, is the actual query sequence. Here is how you would interpret the different fields.

Identifier: sp|P02144|MYG_HUMAN
Description: MYOGLOBIN
Normalized Score: 319
Expect Value: 2e-88

                                                                   Score     E
Sequences producing significant alignments:                        (bits)  Value
sp|P02144|MYG_HUMAN MYOGLOBIN                                         319  2e-88
sp|P02008|HBAZ_HUMAN HEMOGLOBIN ZETA CHAIN                             70  2e-13
sp|P02096|HBG_HUMAN HEMOGLOBIN GAMMA-A AND GAMMA-G CHAINS              57  1e-09
sp|P01922|HBA_HUMAN HEMOGLOBIN ALPHA CHAIN                             52  5e-08
sp|P09105|HBAT_HUMAN HEMOGLOBIN THETA-1 CHAIN                          52  5e-08
sp|P02100|HBE_HUMAN HEMOGLOBIN EPSILON CHAIN                           52  5e-08
sp|Q03001|BPA1_HUMAN BULLOUS PEMPHIGOID ANTIGEN 1 (BPA) (HEMIDES...    28  0.87
sp|P51787|CIQ1_HUMAN VOLTAGE-GATED POTASSIUM CHANNEL PROTEIN KQT...    27  1.5
sp|P04114|APB_HUMAN APOLIPOPROTEIN B-100 PRECURSOR (APO B-100) [...    27  2.6
sp|Q15109|RAGE_HUMAN ADVANCED GLYCOSYLATION END PRODUCT-SPECIFIC...    26  4.4

The identifiers shown here are all from Swiss-Prot, so they will all have an 'sp' in the first field, followed by the accession number, and then a locus name. The sequence identifier syntax, listed below, is quite specific about the information it should contain.

SOURCE

SYNTAX
GenBank gb|accession|locus
EMBL emb|accession|locus
DNA Database of Japan dbj|accession|locus
NBRF PIR pir||entry
Protein Research Foundation prf||name
SWISS-PROT sp|accession|entry name
Protein Data Bank pdb|entry|chain
Patents pat|country|number

GI Numbers

A unique 'gi' number is assigned by NCBI for all sequences contained within their databases. This provides a uniform and stable naming convention whereby each sequence is assigned its own unique 'gi' identifier immune from the accession syntax of the database from whence it came. If a nucleotide or protein sequence changes a new gi identifier is assigned, even if the accession number remains unchanged. This means that gi identifiers also provide a method for identifying the exact sequence that was used or retrieved in a given search.

For WWW BLAST output, the 'NCBI-gi' checkbox option will produce a header line with the gi identifier concatenated with the conventional identifier for the database from which it was derived.

An example, from a nucleotide database:

gi|654321|gb|M73307|AGMA13GT

And similarly for protein databases:

gi|123456|sp|P02144|MYG_HUMAN

ALIGNMENTS

Each alignment is preceded by the sequence identifier, the full definition line and the length of the database sequence. Next come the score (in bits as well as the raw score) as well as the statistical significance of the match, followed by the number of identities and positive matches according to the scoring system (e.g., BLOSUM62) and, if applicable, the number of gaps in the alignment.

Finally the actual alignment is shown In the default format the query is on top and the database match, labeled as 'Sbjct', is underneath. Between the two sequences is information about the alignment. The residue is shown if it is conserved, a '+' is shown if there is a positive score for the match using the scoring matrix. One or more dashes, '-', indicates the presence of a gap. The example below is the first and second sequences listed in the one-line descriptions above.

 

>sp|P02144|MYG_HUMAN MYOGLOBIN
           Length = 154

 Score =  319 bits (809), Expect = 2e-88
 Identities = 154/154 (100%), Positives = 154/154 (100%)

Query: 1   MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE 60
           MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE
Sbjct: 1   MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE 60

Query: 61  DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH 120
           DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH
Sbjct: 61  DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH 120

Query: 121 PGDFGADAQGAMNKALELFRKDMASNYKELGFQG 154
           PGDFGADAQGAMNKALELFRKDMASNYKELGFQG
Sbjct: 121 PGDFGADAQGAMNKALELFRKDMASNYKELGFQG 154

>sp|P02008|HBAZ_HUMAN HEMOGLOBIN ZETA CHAIN
           Length = 142

 Score = 70.2 bits (169), Expect = 2e-13
 Identities = 40/148 (27%), Positives = 69/148 (46%), Gaps = 6/148 (4%)

Query: 1   MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE 60
           M L+  E  +++++W K+       G E L RLF  HP+T   F  F      D    S
Sbjct: 1   MSLTKTERTIIVSMWAKISTQADTIGTETLERLFLSHPQTKTYFPHF------DLHPGSA 54

Query: 61  DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH 120
            L+ HG+ V+ A+G  +K        +  L++ HA   ++     + +S C++  L ++
Sbjct: 55  QLRAHGSKVVAAVGDAVKSIDDIGGALSKLSELHAYILRVDPVNFKLLSHCLLVTLAARF 114

Query: 121 PGDFGADAQGAMNKALELFRKDMASNYK 148
           P DF A+A  A +K L +    +   Y+
Sbjct: 115 PADFTAEAHAAWDKFLSVVSSVLTEKYR 142

Here is what alignments look like if the master-slave format with identities option is selected. A dot '.' is used to indicate an identical amino-acid corresponding to the one in the query sequence.

blast_tmp 1    MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE 60
127661    1    ............................................................ 60
122335    1    .S.TKT.RTIIVSM.A.ISTQADTI.T.T.E...LS..Q.KTY.PH.------.LHPG.A 54
122761    12             ITSL....--NVEDA.G.T.G..LVVY.W.QRF..S.GN.S.ASAIMGNP 59
122412    1    .V..PADKTN.KAA....G.HAGEY.A.A.E.M.LSF.T.KTY.PH.------.LSHG.A 54
122330    1    .A..AEDRA..RAL.K.LGSNVGVYTT.A.E.T.LAF.A.KTY.S---..---.LSPG.S 54
122726    12             .TSL.S.M--NVEEA.G.A.G..LVVY.W.QRF..S.GN.S.PSAILGNP 59
1705495   1451                                                   .A.EA.QEAS 1460
6166005   486                                                            A. 487
114014    3221                                   .SYN..KI....Y.AE..H..L     3242

blast_tmp 61   DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH 120
127661    61   ............................................................ 120
122335    55   Q.RA..SK.VA.V.DAV.SIDDIGGALSK.SEL..YILRVDPVNFKLL.H.LLVT.AARF 114
122761    60   KV.A..KK...S..DAI.HLDDLKGTFAQ.SEL.CD.LHVDPENFKLLGNVLVT..AIHF 119
122412    55   QV.G..KK.AD..TNAVAHVDDMPNALSA.SDL..H.LRVDPVNFKLL.H.LLVT.AAHL 114
122330    55   QVRA..QK.AD..SLAVERLDDLPHALSA.SHL..CQLRVDPASFQLLGH.LLVT.ARHY 114
122726    60   KV.A..KK...SF.DAI.NMDNLKPAFAK.SEL.CD.LHVDPENFKLLGNVMVII.ATHF 119
1705495   1461 ....IKRNYQLE.ESLNHE..KLQR.VDRITRA..VAE.                      1499
6166005   488  ..DLE.E.L..PITH.SQLRE..R.T..                                 515
116609    386                  ..NS.LT.E.A.HCVRIETK.RVLFA.KTKVEHR.TTNK.SE.. 429

blast_tmp 121  PGDFGADAQGAMNKALELFRKDMASNYKELGFQG 154
127661    121  .................................. 154
122335    115  .A..T.E.HA.WD.F.SVVSSVLTEK.R       142
122761    120  GKE.TPEV.ASWQ.MVTAVASALS.R.        146
122412    115  .AE.TPAVHASLD.F.ASVSTVLT.K.R       142
122330    115  ....SPAL.ASLD.F.SHVISALV.E.R       142
122726    120  GKE.TPEV.A.WQ.LVSAVAIAL.HK.        146
116609    430  E.S.---I...-E.SIS.I.               445

PARAMETER SUMMARY

The final section of the BLAST output report summarizes important information about the search. Database information includes its name, revision data, size, and number of sequences. Following this are the Karlin & Altschul parameters K and Lambda, specific to this search, which are used to determine the significance of each match. Other informaton includes statistics about how the BLAST algorithm performed during the search.

  Database: Non-redundant SwissProt sequences
    Posted date:  Dec 20, 1999  8:21 PM
  Number of letters in database: 29,652,561
  Number of sequences in database:  82,258

Lambda     K      H
   0.316    0.136    0.398

Gapped
Lambda     K      H
   0.270   0.0470    0.230


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 856610
Number of Sequences: 82258
Number of extensions: 35161
Number of successful extensions: 109
Number of sequences better than 10.0: 22
Number of HSP's better than 10.0 without gapping: 13
Number of HSP's successfully gapped in prelim test: 9
Number of HSP's that attempted gapping in prelim test: 93
Number of HSP's gapped (non-prelim): 22
length of query: 154
length of database: 2,621,006
effective HSP length: 43
effective length of query: 111
effective length of database: 2,383,517
effective search space: 264570387
effective search space used: 264570387
T: 11
A: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.8 bits)
X3: 64 (24.9 bits)
S1: 41 (21.6 bits)
S2: 52 (24.7 bits)