Workflows - BLAST

Basic Local Alignment Search Tool (BLAST), is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. A BLAST search compares a query sequence with a reference library (or database) of sequences and identifies library sequences that resemble the query sequence above a certain threshold.

CAMERA's BLAST workflows utilizes a parallel version of BLAST where both the input sequence and the reference library are fragmented, run in parallel, and finally the results are merged. This capability allows CAMERA's BLAST workflow to improve computation times and makes CAMERA BLAST well suited for large data set, such as those emerging from “next-generation sequencing” technologies. In addition to CAMERA supported list of reference libraries, users can create their own custom data libraries for BLAST.

BLAST is a family of programs, each is available as a unique workflow from the CAMERA Portal:

  • blastn (nucleotide-neucleotide BLAST): Given a DNA query, returns the most similar DNA sequences from a neucleotide reference library.
  • blastP (protein-protein BLAST): Given a protein query, returns the most similar protein sequences from a protein reference library.
  • blastx (nucleotide 6-frame translation-protein): Given a DNA query, compares the six-frame conceptual translation products against a protein reference library.
  • tblastx (nucleotide 6-frame translation-nucleotide 6-frame translation): Given a DNA query, translates the sequence in in all six possible frames and compares it against the six-frame translations of a nucleotide refererence library.
  • tblastn (protein-nucleotide 6-frame translation): Given a protein query, compares against the all six reading frames of a nucleotide reference library.
  • megablast (large number of query sequences): Megablast is intended for comparing a query to closely related sequences and works best if the target percent identity is 95%. “megablast” is much faster than running BLAST multiple times as it concatenates many input sequences together to form a large sequence before searching the BLAST database, then post-analyze the search results to glean individual alignments and statistical values.
  • “Blast Kegg”: Given a protein query, Blast Kegg searchs against th KEGG protein library. The KEGG number and its pathway/functions will be outputted. Note that this workflow does not have a graphical output but the results can be downloaded to your machine to view.

Notes:

  • CAMERA BLAST currently utilizes version 2.2.18 of NCBI BLAST.
  • Complete list of CAMERA Reference Libraries
  • For blastn and megablast, only the top alignment per hit will be kept for blast jobs when (CAMERA_REF) NCBI Refseq Genomes (N) is used as the reference data set.
  • CAMERA uses the NCBI default blastall parameters. These, however, can be changed to better suit the nature of your query and the purpose of your search. To change these paramters click on the “Advanced Parameters” tab on the workflow submission form.

Resources: