🔍 TBLASTN: Protein to Translated Nucleotide BLAST

TBLASTN (Basic Local Alignment Search Tool, Protein to Translated Nucleotide) is a bioinformatics tool used to compare a protein query sequence against a nucleotide sequence database that has been translated in all six possible reading frames. It's ideal for finding genes or coding regions in unannotated nucleotide sequences using a known protein sequence.

❓ What is TBLASTN?

TBLASTN takes an amino acid (protein) query sequence and searches it against a chosen nucleotide sequence database. Before the search, every sequence in the database is translated in all six possible reading frames. This allows you to find potential coding regions in DNA/RNA databases that are homologous to your protein query.

  • Protein Query vs. Translated Nucleotide Database: Compares protein to a six-frame translated nucleotide database.
  • Gene Finding: Excellent for identifying potential genes in raw genomic or EST data.
  • Versatile Search: Can find protein homologs even if the nucleotide sequence is unannotated.

🎯 Why Use TBLASTN? For Gene Discovery with Protein Evidence

TBLASTN is indispensable for:

  • 🔍 Gene Discovery: Identifying novel genes or coding regions in uncharacterized nucleotide sequences (e.g., newly sequenced genomes, ESTs, transcriptomes) using a known protein as a guide.
  • 🧬 Pseudogene Identification: Helping to distinguish functional genes from non-functional pseudogenes by comparing protein homology.
  • 📊 Functional Annotation: Inferring the function of a genomic region by finding homologous proteins.
  • 🎯 Cross-Species Homology: Finding protein homologs from a protein sequence across nucleotide databases of different organisms.
  • 📈 Sequence Validation: Confirming the coding potential of a nucleotide sequence based on protein similarity.

🧑‍💻 How to Use TBLASTN on Job Dispatcher: A Step-by-Step Guide

Follow these simple steps to perform a protein to translated nucleotide BLAST search:

1️⃣ Navigate to the Tool

  1. From the main menu, go to All Tools (or search for "TBLASTN").
  2. Click the prominent Use Tool button located next to "TBLASTN."

2️⃣ Input Your Protein Sequence

  • Locate the input box (large text area) or the "upload a Sequence File" option.

  • Paste your protein sequence(s) in FASTA format or upload a FASTA file.

    >my_protein_query
    ATGCGTACGTAGCTAGCTAG
    
  • Important: You can provide a sequence either by typing into the text area OR by uploading a file, but not both simultaneously. Please clear one input to proceed.

3️⃣ Configure Parameters

  • 📝 Title: Provide a descriptive title for your job (e.g., "My TBLASTN Search").

  • 💡 Sequence Type: (Automatically set to Protein for TBLASTN, as it expects protein input).

  • 🗄️ Databases: Select one or more nucleotide databases to search against.

    • Default: em_est_env, em_gss_env, em_htc_env, em_htg_env, em_pat_env, em_std_env, em_sts_env, em_tsa_env
    • (Many other options available in the Nucleotide Databases Tree on the form)
  • ⚙️ TASK (task): Choose the specific BLAST task.

    • blastn - Default (This is the underlying task for TBLASTN)
  • 📝 INCL. TAXONOMY IDs (taxids): Enter taxonomy IDs separated by commas (e.g., 9606, 10090, 7227).

  • 📝 EXCL. TAXONOMY IDs (negative_taxids): Enter taxonomy IDs separated by commas (e.g., 9606, 10090, 7227).

  • 📊 Matrix (matrix): Select the scoring matrix to use for protein alignments.

    • BLOSUM62 - Default
    • BLOSUM45, BLOSUM50, BLOSUM80, BLOSUM90
    • PAM30, PAM70, PAM250
    • NONE (M10)
  • ➖ Gap Open (gapopen): The penalty for opening a new gap.

    • Default: 11
    • Options: -1, 0, 1, ..., 25
  • ➖ Gap Extend (gapext): The penalty for extending an existing gap.

    • Default: 1
    • Options: -1, 0, 1, ..., 10
  • 📉 EXP.THR (exp): The Expectation Value (E-value) threshold. Matches with E-values higher than this will not be reported. Lower values are stricter.

    • Default: 10
    • Options: 1e-200, 1e-100, 1e-50, 1e-10, 1e-5, 1e-4, 0.001, 0.01, 0.1, 1.0, 10, 100, 1000, 20, 50
  • 🧹 FILTER (filter): Apply a low-complexity filter.

    • yes (T) - Default
    • no (F)
  • 🗑️ DROPOFF (dropoff): Dropoff value for the Gapped BLAST algorithm.

    • Default: 0
    • Options: 2, 4, 6, 8, 10
  • 🔢 SCORES (scores): Maximum number of scores to report.

    • Default: 50
    • Options: 0, 5, 10, 20, 50, 100, 150, 200, 250, 500, 750, 1000
  • ↔️ ALIGNMENTS (alignments): Maximum number of alignments to report.

    • Default: 50
    • Options: 0, 5, 10, 20, 50, 100, 150, 200, 250, 500, 750, 1000
  • 📏 SEQUENCE RANGE (seqrange): Define a specific range within the query sequence to search.

    • Default: START-END (entire sequence)
  • 🔢 HSPS (hsps): Maximum number of High-scoring Segment Pairs (HSPs) to report.

    • Default: 100
    • Input type: Number
  • ↔️ GAPALIGN (gapalign): Perform gapped alignments.

    • true - Default
    • false
  • 👁️ ALIGN VIEWS (align): Choose the format for displaying alignments.

    • 0 (pairwise) - Default
    • 1 (Query-anchored identities), 2 (Query-anchored non-identities), 3 (Flat query-anchored identities), 4 (Flat query-anchored non-identities)
    • 5 (BLASTXML), 6 (Tabular), 7 (Tabular with comment lines), 8 (Text ASN.1), 9 (Binary ASN.1), 10 (Comma-separated values), 11 (BLAST archive format (ASN.1)), 12 (Tabular with comment lines with btop)
  • 📊 COMPOSITION-BASED (compstats): Use composition-based statistics.

    • F - Default
    • D, 1, 2, 3
  • 📏 WORD SIZE (wordsize): The length of the initial exact match (seed) required to initiate an alignment.

    • Default: 7
    • Input type: Number

4️⃣ Submit Your Job

  • Once your sequence is entered and parameters are set, click the Submit or Run button.
  • Your job will be dispatched to the EMBL-EBI Web Service. You will be automatically redirected to a Job Status page to monitor its progress.

5️⃣ Interpret Results

  • On the results page, you will find a summary of your TBLASTN search, including a graphical overview of matches, a table of significant alignments, and detailed pairwise alignments.
  • Pay attention to the E-value and the specific reading frame in the database sequence that best matches your protein query.
  • ⭐ Tip: TBLASTN is highly effective for finding potential coding regions in raw DNA or RNA sequences when you have a protein sequence of interest.

💬 Need Help?

If you run into issues, please visit our Contact Us page for support. Happy BLASTing!