
The BLAST report is divided into a number of different logical sections each explained below. While the example below is the result of a blastp search, results from the other programs would be analogous. BLAST results can be retrieved as text or formatted for your browser in HTML. A flat text file is returned when using the stand-alone version of BLAST or the BLAST e-mail server and may be preferrable if any user implemented post processing is planned.
The BLAST report begins with header information that lists the type of program, the version, and a release date. Following that is a reference for the BLAST program, this is what should be cited if search results are published.
Finally the full definition field for the query sequence, and a summary of the database searched is shown.
BLASTP 2.0.10 [Aug-26-1999]
Reference:
Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schäffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= gi|4885477|ref|NP_005359.1|pMB| myoglobin
(154 letters)
Database: Non-redundant SwissProt sequences
82,258 sequences; 29,652,561 total letters
If you have any problems or questions with the results of this search
please refer to the BLAST FAQs
|
The second section of the report consists of the one-line descriptions of top database matches. These one line descriptions include a database sequence identifier, definition line, as well as the normalized score S' and the statistical significance 'E value' for the match
The first match, in this case, is the actual query sequence. Here is how you would interpret the different fields.
Identifier: sp|P02144|MYG_HUMAN
Description: MYOGLOBIN
Normalized Score: 319
Expect Value: 2e-88
Score E
Sequences producing significant alignments: (bits) Value
sp|P02144|MYG_HUMAN MYOGLOBIN 319 2e-88
sp|P02008|HBAZ_HUMAN HEMOGLOBIN ZETA CHAIN 70 2e-13
sp|P02096|HBG_HUMAN HEMOGLOBIN GAMMA-A AND GAMMA-G CHAINS 57 1e-09
sp|P01922|HBA_HUMAN HEMOGLOBIN ALPHA CHAIN 52 5e-08
sp|P09105|HBAT_HUMAN HEMOGLOBIN THETA-1 CHAIN 52 5e-08
sp|P02100|HBE_HUMAN HEMOGLOBIN EPSILON CHAIN 52 5e-08
sp|Q03001|BPA1_HUMAN BULLOUS PEMPHIGOID ANTIGEN 1 (BPA) (HEMIDES... 28 0.87
sp|P51787|CIQ1_HUMAN VOLTAGE-GATED POTASSIUM CHANNEL PROTEIN KQT... 27 1.5
sp|P04114|APB_HUMAN APOLIPOPROTEIN B-100 PRECURSOR (APO B-100) [... 27 2.6
sp|Q15109|RAGE_HUMAN ADVANCED GLYCOSYLATION END PRODUCT-SPECIFIC... 26 4.4
|
The identifiers shown here are all from Swiss-Prot, so they will all have an 'sp' in the first field, followed by the accession number, and then a locus name. The sequence identifier syntax, listed below, is quite specific about the information it should contain.
|
|
|
| GenBank | gb|accession|locus |
| EMBL | emb|accession|locus |
| DNA Database of Japan | dbj|accession|locus |
| NBRF PIR | pir||entry |
| Protein Research Foundation | prf||name |
| SWISS-PROT | sp|accession|entry name |
| Protein Data Bank | pdb|entry|chain |
| Patents | pat|country|number |
A unique 'gi' number is assigned by NCBI for all sequences contained within their databases. This provides a uniform and stable naming convention whereby each sequence is assigned its own unique 'gi' identifier immune from the accession syntax of the database from whence it came. If a nucleotide or protein sequence changes a new gi identifier is assigned, even if the accession number remains unchanged. This means that gi identifiers also provide a method for identifying the exact sequence that was used or retrieved in a given search.
For WWW BLAST output, the 'NCBI-gi' checkbox option will produce a header line with the gi identifier concatenated with the conventional identifier for the database from which it was derived.
An example, from a nucleotide database:
gi|654321|gb|M73307|AGMA13GT
And similarly for protein databases:
gi|123456|sp|P02144|MYG_HUMAN
Each alignment is preceded by the sequence identifier, the full definition line and the length of the database sequence. Next come the score (in bits as well as the raw score) as well as the statistical significance of the match, followed by the number of identities and positive matches according to the scoring system (e.g., BLOSUM62) and, if applicable, the number of gaps in the alignment.
Finally the actual alignment is shown In the default format the query is on top and the database match, labeled as 'Sbjct', is underneath. Between the two sequences is information about the alignment. The residue is shown if it is conserved, a '+' is shown if there is a positive score for the match using the scoring matrix. One or more dashes, '-', indicates the presence of a gap. The example below is the first and second sequences listed in the one-line descriptions above.
>sp|P02144|MYG_HUMAN MYOGLOBIN
Length = 154
Score = 319 bits (809), Expect = 2e-88
Identities = 154/154 (100%), Positives = 154/154 (100%)
Query: 1 MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE 60
MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE
Sbjct: 1 MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE 60
Query: 61 DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH 120
DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH
Sbjct: 61 DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH 120
Query: 121 PGDFGADAQGAMNKALELFRKDMASNYKELGFQG 154
PGDFGADAQGAMNKALELFRKDMASNYKELGFQG
Sbjct: 121 PGDFGADAQGAMNKALELFRKDMASNYKELGFQG 154
>sp|P02008|HBAZ_HUMAN HEMOGLOBIN ZETA CHAIN
Length = 142
Score = 70.2 bits (169), Expect = 2e-13
Identities = 40/148 (27%), Positives = 69/148 (46%), Gaps = 6/148 (4%)
Query: 1 MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE 60
M L+ E +++++W K+ G E L RLF HP+T F F D S
Sbjct: 1 MSLTKTERTIIVSMWAKISTQADTIGTETLERLFLSHPQTKTYFPHF------DLHPGSA 54
Query: 61 DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH 120
L+ HG+ V+ A+G +K + L++ HA ++ + +S C++ L ++
Sbjct: 55 QLRAHGSKVVAAVGDAVKSIDDIGGALSKLSELHAYILRVDPVNFKLLSHCLLVTLAARF 114
Query: 121 PGDFGADAQGAMNKALELFRKDMASNYK 148
P DF A+A A +K L + + Y+
Sbjct: 115 PADFTAEAHAAWDKFLSVVSSVLTEKYR 142
|
Here is what alignments look like if the master-slave format with identities option is selected. A dot '.' is used to indicate an identical amino-acid corresponding to the one in the query sequence.
blast_tmp 1 MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE 60 127661 1 ............................................................ 60 122335 1 .S.TKT.RTIIVSM.A.ISTQADTI.T.T.E...LS..Q.KTY.PH.------.LHPG.A 54 122761 12 ITSL....--NVEDA.G.T.G..LVVY.W.QRF..S.GN.S.ASAIMGNP 59 122412 1 .V..PADKTN.KAA....G.HAGEY.A.A.E.M.LSF.T.KTY.PH.------.LSHG.A 54 122330 1 .A..AEDRA..RAL.K.LGSNVGVYTT.A.E.T.LAF.A.KTY.S---..---.LSPG.S 54 122726 12 .TSL.S.M--NVEEA.G.A.G..LVVY.W.QRF..S.GN.S.PSAILGNP 59 1705495 1451 .A.EA.QEAS 1460 6166005 486 A. 487 114014 3221 .SYN..KI....Y.AE..H..L 3242 blast_tmp 61 DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH 120 127661 61 ............................................................ 120 122335 55 Q.RA..SK.VA.V.DAV.SIDDIGGALSK.SEL..YILRVDPVNFKLL.H.LLVT.AARF 114 122761 60 KV.A..KK...S..DAI.HLDDLKGTFAQ.SEL.CD.LHVDPENFKLLGNVLVT..AIHF 119 122412 55 QV.G..KK.AD..TNAVAHVDDMPNALSA.SDL..H.LRVDPVNFKLL.H.LLVT.AAHL 114 122330 55 QVRA..QK.AD..SLAVERLDDLPHALSA.SHL..CQLRVDPASFQLLGH.LLVT.ARHY 114 122726 60 KV.A..KK...SF.DAI.NMDNLKPAFAK.SEL.CD.LHVDPENFKLLGNVMVII.ATHF 119 1705495 1461 ....IKRNYQLE.ESLNHE..KLQR.VDRITRA..VAE. 1499 6166005 488 ..DLE.E.L..PITH.SQLRE..R.T.. 515 116609 386 ..NS.LT.E.A.HCVRIETK.RVLFA.KTKVEHR.TTNK.SE.. 429 blast_tmp 121 PGDFGADAQGAMNKALELFRKDMASNYKELGFQG 154 127661 121 .................................. 154 122335 115 .A..T.E.HA.WD.F.SVVSSVLTEK.R 142 122761 120 GKE.TPEV.ASWQ.MVTAVASALS.R. 146 122412 115 .AE.TPAVHASLD.F.ASVSTVLT.K.R 142 122330 115 ....SPAL.ASLD.F.SHVISALV.E.R 142 122726 120 GKE.TPEV.A.WQ.LVSAVAIAL.HK. 146 116609 430 E.S.---I...-E.SIS.I. 445 |
The final section of the BLAST output report summarizes important information about the search. Database information includes its name, revision data, size, and number of sequences. Following this are the Karlin & Altschul parameters K and Lambda, specific to this search, which are used to determine the significance of each match. Other informaton includes statistics about how the BLAST algorithm performed during the search.
Database: Non-redundant SwissProt sequences
Posted date: Dec 20, 1999 8:21 PM
Number of letters in database: 29,652,561
Number of sequences in database: 82,258
Lambda K H
0.316 0.136 0.398
Gapped
Lambda K H
0.270 0.0470 0.230
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 856610
Number of Sequences: 82258
Number of extensions: 35161
Number of successful extensions: 109
Number of sequences better than 10.0: 22
Number of HSP's better than 10.0 without gapping: 13
Number of HSP's successfully gapped in prelim test: 9
Number of HSP's that attempted gapping in prelim test: 93
Number of HSP's gapped (non-prelim): 22
length of query: 154
length of database: 2,621,006
effective HSP length: 43
effective length of query: 111
effective length of database: 2,383,517
effective search space: 264570387
effective search space used: 264570387
T: 11
A: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.8 bits)
X3: 64 (24.9 bits)
S1: 41 (21.6 bits)
S2: 52 (24.7 bits)
|