BLAST Tool

Blast type  

Databases

Annotated ORFs nucleotideAnnotated ORFs amino acid
C. glabrata
D. hansenii
E. gossypii
K. lactis
K. thermotolerans
S. cerevisiae
S. kluyveri
Y. lipolytica
Z. rouxii
C. glabrata
D. hansenii
E. gossypii
K. lactis
K. thermotolerans
S. cerevisiae
S. kluyveri
Y. lipolytica
Z. rouxii
Complete sequences (nt)Partial sequences (nt)
C. glabrata
D. hansenii
E. gossypii
K. lactis
K. thermotolerans
S. cerevisiae
S. kluyveri
Y. lipolytica
Z. rouxii
Génolevures 1 RSTs (All species)
Génolevures 1 RSTs (species using Alternative Yeast Nuclear code)
Génolevures 1 RSTs (species using Standard code)
Enter here your input data as a sequence in FASTA format:

Or load it



Filter
Expect
Matrix
Preform alignment
Query Genetic Codes (blastx and tblastx only)
Frame shift penalty (blastx only)
Other advanced options

BLAST Manual

Program Types  


  • blastp: compares an amino acid query sequence against a protein sequence database
  • blastn: compares a nucleotide query sequence against a nucleotide sequence database
  • blastx: compares a nucleotide query sequence translated in all reading frames against a protein sequence database
  • tblastn: compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames
  • tblastx: compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

Databases  


Annotated ORFs

  • S. cerevisiae: Saccharomyces cerevisiae nucleotide sequences of annoted ORFs, available by anonymous FTP from SGD (2008/10/28).
  • E. gossypii: Eremothecium gossypii nucleotide sequences of annoted ORFs, available by anonymous FTP from EBI (2008/10/27).
  • C. glabrata, D. hansenii, K. lactis, K. thermotolerans, S. kluyveri, Y. lipolytica, Z. rouxii: nucleotide sequences of each species manually annotated by Génolevures.
Note: Debaryomyces hansenii uses the Alternative Yeast Nuclear genetic code.

Complete sequences

  • S. cerevisiae: Saccharomyces cerevisiae complete genomic sequence: 16 nuclear chromosomes plus the mitochondrial chromosome, available by anonymous FTP from SGD (2008/10/28).
  • E. gossypii: Eremothecium gossypii complete genomic sequence: 7 nuclear chromosomes plus the mitochondrial chromosome, available by anonymous FTP fromEBI (2008/10/27).
  • C. glabrata, D. hansenii, K. lactis, K. thermotolerans, S. kluyveri, Y. lipolytica, Z. rouxii: complete genomic sequence: nuclear chromosomes or contigs (Génolevures).
Note: Debaryomyces hansenii uses the Alternative Yeast Nuclear genetic code.

Partial sequences

  • RST Génolevures: about 50000 Random Sequence Tags (up to 1 kb) from 13 yeast species representative of the various branches of the Hemiascomycetous class (2500-5000 RST/specie, single pass sequencing)(Génolevures, 01/12/2002).
  • RST Génolevures (Standard) / RST Génolevures (Alternative Yeast Nuclear): among the 13 species studied in Génolevures project, 3 species use the Alternative Yeast Nuclear genetic code: Debaryomyces hansenii, Pichia sorbitophila and Candida tropicalis.
    RSTs are shared out in RST Génolevures (Standard, 10 species), and RST Génolevures (Alternative Yeast Nuclear, 3 species) databases in order to use the cognate genetic code when computing translated products (Génolevures, 01/12/2002).

Input sequence  


Input sequence formatting

The input sequence is modified as follows to comply to a valid FASTA format, compatible with the type of sequence needed for the chosen Blast program (i. e. nucleic acid sequence for BLASTN, BLASTX and TBLASTX; protein sequence for BLASTP and TBLASTN).
  • Defline: if the first line of the input sequence (definition line) does not begin with a ">", the defline ">raw sequence" is added and all the characters of the input are processed as sequence (see next paragraph).
  • Sequence: only the characters listed below (IUPAC codes) are accepted, all other characters are automatically removed.
    Nucleic acid sequence
    A : Adenosine R : G A (puRine) B : G T C (not A) N : A G C T (aNy)
    C : Cytidine Y : T C (pYrimidine) D : G A T (not C) - : gap
    G : Guanine K : G T (Keto) H : A C T (not G)
    T : Thymidine M : A C (aMino) V : G C A (not T, not U)
    U : Uridine S : G C (Strong)
    W : A T (Weak)
    Protein sequence
    A : alanine H : histidine Q : glutamine Y : tyrosine
    B : aspartate or asparagine I : isoleucine R : arginine Z : glutamate or glutamine
    C : cysteine K : lysine S : serine X : any
    D : aspartate L : leucine T : threonine * : translation stop
    E : glutamate M : methionine U : selenocysteine - : gap
    F : phenylalanine N : asparagine V : valine
    G : glycine P : proline W : tryptophan


BLAST Search parameters  


Filter

  • Low-complexity: Mask off segments of the query sequence that have low compositional complexity, as determined by the SEG program of Wootton & Federhen (Computers and Chemistry, 1993), or segments consisting of short-periodicity internal repeats, as determined by the XNU program of Claverie & States (Computers and Chemistry, 1993), or, for BLASTN, by the DUST program of Tatusov and Lipman (in preparation). Filtering can eliminate statistically significant but biologically uninteresting reports from the blast output (e.g., hits against common acidic-, basic- or proline-rich regions), leaving the more biologically interesting regions of the query sequence available for specific matching against database sequences. Low complexity sequence found by a filter program is substituted using the letter "N" in nucleotide sequence (e.g., "NNNNNNNNNNNNN") and the letter "X" in protein sequences (e.g., "XXXXXXXXX").

Users may turn off filtering by selecting the "None" value.

Filtering is only applied to the query sequence (or its translation products), not to database sequences. Default filtering is DUST for BLASTN, SEG for other programs. It is not unusual for nothing at all to be masked by SEG, XNU, or both, when applied to sequences in SWISS-PROT, so filtering should not be expected to always yield an effect. Furthermore, in some cases, sequences are masked in their entirety, indicating that the statistical significance of any matches reported against the unfiltered query sequence should be suspect.

Expect

The statistical significance threshold for reporting matches against database sequences; the default value is 10, such that 10 matches are expected to be found merely by chance, according to the stochastic model of Karlin and Altschul (1990). If the statistical significance ascribed to a match is greater than the EXPECT threshold, the match will not be reported. Lower EXPECT thresholds are more stringent, leading to fewer chance matches being reported. Fractional values are acceptable.

Matrix

Specify an alternate scoring matrix for BLASTP, BLASTX, TBLASTN and TBLASTX. The default matrix is BLOSUM62 (Henikoff & Henikoff, 1992). The valid alternative choices include: PAM40, PAM120, PAM250 and IDENTITY. No alternate scoring matrices are available for BLASTN; specifying the MATRIX directive in BLASTN requests returns an error response.

Preform Alignment

Preform Alignment Option "gapped" allows gaps to be introduced into sequence alignments. This default option ensures that any similarities, even those that define a domain within the coding region will be identified, if the extent of local similarity is high enough. This default gapped setting of BLAST 2.0 reports the best local alignments and is suitable for most applications. An ungapped search, on the other hand, may be desirable when hits that align to the entire length of the query are most interesting.

Query Genetic Code

Genetic code to be used in blastx translation of the query.
All yeast species in the Génolevures database use the Standard (1) genetic code, except D. hansenii which uses the Alternative Yeast Nuclear (12) genetic code.

Frame shift penalty: Out-Of-Frame BLAST notation (OOF)

When protein aligned to the nucleotide there are 6 possibilities of match at any point. In OOF alignment - upper sequence is DNAP - 3-frame translated DNA. Lower sequence is protein. At any position next protein base may be aligned to 6 possible bases in DNAP (TBO: Traditional Blast Output).

  • 0: 3 nucleotides missing - gap (TBO notation "-")
    OOF alignment with DNAP:

    DTRGGDTPQKSVFSRAQNTLWGERGDTQKRGGAQRGDIFSLWGG-GVLCV
    | | | | | | | | | | | | | | | | |
    D G T K F A T G G Q G Q D S G K V V

    TBO:

    DGTKFATGGQGQDSG-VV
    DGTKFATGGQGQDSG VV
    DGTKFATGGQGQDSGKVV


  • 1: 2 nucleotides missing - "frameshift -2" (TBO notation "\\")
    OOF alignment with DNAP:

    DTRGGDTPQKSVFSRAQNTLWGERGDTQKRGGAQRGDIFSLWGGGGVLCV
    | | | | | | | | | | | | | | |/ | |
    D G T K F A T G G Q G Q D S GK V V

    TBO:

    DGTKFATGGQGQDSG\\GVV
    DGTKFATGGQGQDSG VV
    DGTKFATGGQGQDSG KVV


  • 2: 1 nucletide missing - "frameshift -1" (TBO notation "\")
    OOF alignment with DNAP:

    DTRGGDTPQKSVFSRAQNTLWGERGDTQKRGGAQRGDIFSLWGGERGV
    | | | | | | | | | | | | | | / | |
    D G T K F A T G G Q G Q D S G K V

    TBO:

    DGTKFATGGQGQDS\GEV
    DGTKFATGGQGQDS G V
    DGTKFATGGQGQDS GKV


  • 3: Complete match
    OOF alignment with DNAP:

    DTRGGDTPQKSVFSRAQNTLWGERGDTQKRGGAQRGDIFSLWGGEKRGV
    | | | | | | | | | | | | | | | | |
    D G T K F A T G G Q G Q D S G K V

    TBO:

    DGTKFATGGQGQDSGKV
    DGTKFATGGQGQDSGKV
    DGTKFATGGQGQDSGKV


  • 4: 1 nucleotide insertion - "frameshift +1" (TBO notation "/")
    OOF alignment with DNAP:

    DTRGGDTPQKSVFSRAQNTLWGERGDTQKRGGAQRGDIFSLWGGVEKRGV
    | | | | | | | | | | | | | | | \
    D G T K F A T G G Q G Q D S G K V

    TBO:

    DGTKFATGGQGQDSG/KV
    DGTKFATGGQGQDSG KV
    DGTKFATGGQGQDSG KV


  • 5: 2 nucleotides insertion - "frameshift +2" (TBP notation "//")
    OOF alignment with DNAP:

    DTRGGDTPQKSVFSRAQNTLWGERGDTQKRGGAQRGDIFSLFLWGGEKRGV
    | | | | | | | | | | | | | | \ | |
    D G T K F A T G G Q G Q D S G K V

    TBO:

    DGTKFATGGQGQDS//GKV
    DGTKFATGGQGQDS GKV
    DGTKFATGGQGQDS GKV



Other advanced options

  • BLASTN Program Advanced Options:
    -G	Cost to open a gap [Integer], default = 5
    -E	Cost to extend a gap [Integer], default = 2
    -q	Penalty for a mismatch in the blast portion of run [Integer], default = -3
    -r	Reward for a match in the blast portion of run [Integer], default = 1
    -W	Word size [Integer], default = 11
    -v	Number of one-line descriptions [Integer], default = 100
    -b	Number of alignments to show [Integer], default = 100
    

  • BLASTP Program Advanced Options:
    -G  Cost to open a gap [Integer], matrix dependent, default = 11 for BLOSUM62
    -E  Cost to extend a gap [Integer], matrix dependent, default = 1 for BLOSUM62
    -W  Word size [Integer], default =  3
    -v  Number of one-line descriptions (V) [Integer] default = 100
    -b  Number of alignments to show (B) [Integer] default = 100
    

  • BLASTX Program Advanced Options:
    -G  Cost to open a gap [Integer], matrix dependent, default = 11 for BLOSUM62
    -E  Cost to extend a gap [Integer], matrix dependent, default = 1 for BLOSUM62
    -W  Word size [Integer], default =  3
    -v  Number of one-line descriptions (V) [Integer] default = 100
    -b  Number of alignments to show (B) [Integer] default = 100
    

  • TBLASTN Program Advanced Options:
    -D  Genetic code for database translation [Integer], default = 1
    -G  Cost to open a gap [Integer], matrix dependent, default = 11 for BLOSUM62
    -E  Cost to extend a gap [Integer], matrix dependent, default = 1 for BLOSUM62
    -W  Word size [Integer], default =  3
    -v  Number of one-line descriptions (V) [Integer] default = 100
    -b  Number of alignments to show (B) [Integer] default = 100
    

  • TBLASTX Program Advanced Options:
    -D  Genetic code for database translation [Integer], default = 1
    -G  Cost to open a gap [Integer], matrix dependent, default = 11 for BLOSUM62
    -E  Cost to extend a gap [Integer], matrix dependent, default = 1 for BLOSUM62
    -W  Word size [Integer], default =  3
    -v  Number of one-line descriptions (V) [Integer] default = 100
    -b  Number of alignments to show (B) [Integer] default = 100
    

  • Example:
    -G 9 -E 2 -W 2