🔍 FASTA (Nucleotide-Nucleotide): Fast DNA/RNA Similarity Search

FASTA (Nucleotide-Nucleotide) is a widely used bioinformatics tool for rapidly searching a nucleotide (DNA or RNA) query sequence against a nucleotide sequence database. It identifies regions of local similarity between nucleic acid sequences, helping to infer functional or evolutionary relationships.

❓ What is FASTA (Nucleotide-Nucleotide)?

FASTA (when used for nucleotide-nucleotide searches, often denoted as FASTA-N) takes a DNA or RNA query sequence and compares it against a chosen nucleotide sequence database. It uses a heuristic algorithm to quickly find regions of high similarity, providing a list of hits and their alignments. It's a foundational tool for direct nucleic acid sequence comparison.

  • Nucleotide Query vs. Nucleotide Database: Compares DNA/RNA to DNA/RNA.
  • Local Similarity Search: Finds regions of highest similarity.
  • Rapid Homology Detection: Efficiently identifies related nucleic acids.

🎯 Why Use FASTA (Nucleotide-Nucleotide)? For Quick DNA/RNA Homology

FASTA (Nucleotide-Nucleotide) is indispensable for:

  • 🔍 Gene Identification: Finding known genes or homologous sequences in a newly sequenced genome or transcript.
  • 🧬 Primer/Probe Specificity: Checking the specificity of PCR primers or hybridization probes against a genome or transcript database.
  • 📊 Sequence Verification: Confirming the identity of a cloned DNA fragment.
  • 🎯 Contamination Detection: Identifying contaminating sequences in your sample.
  • 📈 Variant Screening: Quickly screening for known sequence variants.

🧑‍💻 How to Use FASTA (Nucleotide-Nucleotide) on Job Dispatcher: A Step-by-Step Guide

Follow these simple steps to perform a nucleotide-nucleotide FASTA search:

1️⃣ Navigate to the Tool

  1. From the main menu, go to All Tools (or search for "FASTA (Nucleotide vs. Nucleotide Search)").
  2. Click the prominent Use Tool button located next to "FASTA (Nucleotide vs. Nucleotide Search)."

2️⃣ Input Your Nucleotide Sequence

  • Locate the input box (large text area) or the "upload a Sequence File" option.

  • Paste your nucleotide (DNA or RNA) sequence(s) in FASTA format or upload a FASTA file.

    >my_dna_query
    ATGGCCATGGCACTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCATG
    
  • Important: You can provide a sequence either by typing into the text area OR by uploading a file, but not both simultaneously. Please clear one input to proceed.

3️⃣ Configure Parameters

  • 📝 Title: Provide a descriptive title for your job (e.g., "My FASTA-NN Search").

  • 💡 Program: Select the specific FASTA program to run.

    • FASTA - Default
    • SSEARCH (More sensitive Smith-Waterman)
    • GGSEARCH (Global-Global alignment)
    • GLSEARCH (Global-Local alignment)
  • 🗄️ Databases: Select one or more nucleotide databases to search against.

    • Default: em_est_env, em_gss_env, em_htc_env, em_htg_env, em_pat_env, em_std_env, em_sts_env, em_tsa_env
    • (Many other options available in the Nucleotide Databases Tree on the form)
  • ➕ MATCH/MISMATCH SCORES (match_scores): Define scores for matches and mismatches.

    • Default: +5/-4
    • Options: +5/-4, +3/-2, N/A (none)
  • ➖ Gap Open (gapopen): The penalty for opening a new gap.

    • Default: -10
    • Options: default (10), 0, -1, ..., -19
  • ➖ Gap Extend (gapext): The penalty for extending an existing gap.

    • Default: -2
    • Options: default (10), 0, -1, ..., -16
  • 📏 KTUP (ktup): The size of the word (k-tuple) used for initial seeding. Higher values are faster but less sensitive.

    • Default: 6
    • Options: 6, 5, 4, 3, 2, 1, N/A (-1)
  • 📈 EXPECTATION UPPER LIMIT (expupperlim): Maximum E-value for reported matches. Lower values are stricter.

    • Default: 10
    • Options: 1e-300, 1e-100, 1e-50, 1e-10, 1e-5, 0.001, 0.1, 1, 2, 5, 10, 20, 50
  • 📉 EXPECTATION LOWER LIMIT (explowlim): Minimum E-value for reported matches. Allows excluding very closely related hits.

    • Default: 0
    • Options: 0, 1e-300, 1e-100, 1e-50, 1e-10, 1e-5, 0.001, 0.1, 1, 2, 5, 10, 20, 50
  • ↔️ STRAND (strand): For nucleotide sequences, specify the sequence strand to be used for the search.

    • both - Default
    • top
    • bottom
  • 📊 HISTOGRAM (hist): Display a histogram of scores in the FASTA result.

    • false (no) - Default
    • true (yes)
  • 🧹 FILTER (filter): Filter regions of low sequence complexity.

    • none - Default
    • dust (DUST filter)
  • 📊 STATISTICAL ESTIMATES (stats): Method for calculating statistical significance.

    • 1 (Regress) - Default
    • 2 (MLE), 3 (Altshul-Gish), 4 (Regress/shuf.), 5 (MLE/shuf.)
  • 🔢 SCORES (scores): Maximum number of match score summaries to report.

    • Default: 50
    • Options: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 500, 750, 1000
  • ↔️ ALIGNMENTS (alignments): Maximum number of alignments to report.

    • Default: 50
    • Options: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 500, 750, 1000
  • 📏 SEQUENCE RANGE (seqrange): Specify a range within the query sequence to search.

    • Default: START-END (entire sequence)
  • 🗄️ DATABASE RANGE (dbrange): Specify a length range for database sequences to search against.

    • Default: START-END (all lengths)
  • 🔢 MULTI HSPS (hsps): Display all significant High-scoring Segment Pairs (HSPs) between query and library sequence.

    • no (false) - Default
    • yes (true)
  • ⚙️ SCORE REPORT FORMAT (scoreformat): Choose the format for the score report.

    • default - Default
    • -m 8 -- blast tabular, -m 8C -- BLAST tabular with comments, etc. (various tabular and ASN.1 formats)

4️⃣ Submit Your Job

  • Once your sequence is entered and parameters are set, click the Submit or Run button.
  • Your job will be dispatched to the EMBL-EBI Web Service. You will be automatically redirected to a Job Status page to monitor its progress.

5️⃣ Interpret Results

  • On the results page, you will find a summary of your FASTA (Nucleotide-Nucleotide) search, including a list of significant hits, their scores, and alignments.
  • Pay attention to the E-value, which indicates the statistical significance of the match. Lower E-values are more significant.
  • ⭐ Tip: Use the STRAND option carefully; searching both strands is common, but specific needs might require 'top' or 'bottom' only.

💬 Need Help?

If you run into issues, please visit our Contact Us page for support. Happy FASTA searching!