🔍 FASTA (Protein-Protein): Fast Protein Similarity Search

FASTA (Protein-Protein) is a widely used bioinformatics tool for rapidly searching a protein query sequence against a protein sequence database. It's designed to identify homologous sequences and infer functional or evolutionary relationships based on local sequence similarity.

❓ What is FASTA (Protein-Protein)?

FASTA (when used for protein-protein searches, often denoted as FASTA-P) takes an amino acid (protein) query sequence and compares it against a chosen protein sequence database. It uses a heuristic algorithm to quickly find regions of high similarity, providing a list of hits and their alignments. It's a foundational tool for sequence comparison.

  • Protein Query vs. Protein Database: Compares protein to protein.
  • Local Similarity Search: Finds regions of highest similarity.
  • Rapid Homology Detection: Efficiently identifies related proteins.

🎯 Why Use FASTA (Protein-Protein)? For Quick Protein Homology

FASTA (Protein-Protein) is indispensable for:

  • 🔍 Functional Annotation: Predicting the function of a novel protein by finding homologous proteins with known functions.
  • 🧬 Homology Search: Identifying evolutionarily related proteins across different species.
  • 📊 Protein Family Classification: Assigning a protein to a known family or superfamily.
  • 🎯 Initial Database Screening: Performing a fast preliminary search before more rigorous alignment.
  • 📈 Gene Discovery: Identifying new genes by searching translated genomic regions against protein databases.

🧑‍💻 How to Use FASTA (Protein-Protein) on Job Dispatcher: A Step-by-Step Guide

Follow these simple steps to perform a protein-protein FASTA search:

1️⃣ Navigate to the Tool

  1. From the main menu, go to All Tools (or search for "FASTA (Protein vs. Protein Search)").
  2. Click the prominent Use Tool button located next to "FASTA (Protein vs. Protein Search)."

2️⃣ Input Your Protein Sequence

  • Locate the input box (large text area) or the "upload a Sequence File" option.

  • Paste your protein sequence(s) in FASTA format or upload a FASTA file.

    >my_protein_query
    MGDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKKATNE
    
  • Important: You can provide a sequence either by typing into the text area OR by uploading a file, but not both simultaneously. Please clear one input to proceed.

3️⃣ Configure Parameters

  • 📝 Title: Provide a descriptive title for your job (e.g., "My FASTA-PP Search").

  • 💡 Program: Select the specific FASTA program to run.

    • FASTA - Default
    • SSEARCH (More sensitive Smith-Waterman)
    • GGSEARCH (Global-Global alignment)
    • GLSEARCH (Global-Local alignment)
  • 🗄️ Databases: Select one or more protein databases to search against.

    • Default: uniprotkb_swissprot
    • (Many other options available in the Protein Databases Tree on the form)
  • 📊 Matrix (matrix): Select the scoring matrix for protein alignments.

    • BL62 (BLOSUM62) - Default
    • BL50 (BLOSUM50)
    • BP62 (BLASTP62)
    • BL80 (BLOSUM80)
    • P250 (PAM250), P120 (PAM120)
    • M40, M20, M10 (MDM matrices)
    • VT160, VT120, VT80, VT40, VT20, VT10 (VTML matrices)
  • ➖ Gap Open (gapopen): The penalty for opening a new gap.

    • Default: -10
    • Options: default (10), 0, -1, ..., -19
  • ➖ Gap Extend (gapext): The penalty for extending an existing gap.

    • Default: -2
    • Options: default (10), 0, -1, ..., -16
  • 📏 KTUP (ktup): The size of the word (k-tuple) used for initial seeding. Higher values are faster but less sensitive.

    • Default: 6
    • Options: 6, 5, 4, 3, 2, 1, N/A (-1)
  • 📈 EXPECTATION UPPER LIMIT (expupperlim): Maximum E-value for reported matches. Lower values are stricter.

    • Default: 10
    • Options: 1e-300, 1e-100, 1e-50, 1e-10, 1e-5, 0.001, 0.1, 1, 2, 5, 10, 20, 50
  • 📉 EXPECTATION LOWER LIMIT (explowlim): Minimum E-value for reported matches. Allows excluding very closely related hits.

    • Default: 0
    • Options: 0, 1e-300, 1e-100, 1e-50, 1e-10, 1e-5, 0.001, 0.1, 1, 2, 5, 10, 20, 50
  • 📊 HISTOGRAM (hist): Display a histogram of scores in the results.

    • false (no) - Default
    • true (yes)
  • 🧹 FILTER (filter): Filter regions of low sequence complexity.

    • none - Default
    • seg (SEG filter)
    • xnu (XNU filter)
    • seg+xnu
  • 📊 STATISTICAL ESTIMATES (stats): Method for calculating statistical significance.

    • 1 (Regress) - Default
    • 2 (MLE), 3 (Altshul-Gish), 4 (Regress/shuf.), 5 (MLE/shuf.)
  • 🔢 SCORES (scores): Maximum number of match score summaries to report.

    • Default: 50
    • Options: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 500, 750, 1000
  • ↔️ ALIGNMENTS (alignments): Maximum number of alignments to report.

    • Default: 50
    • Options: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 500, 750, 1000
  • 📏 SEQUENCE RANGE (seqrange): Specify a range within the query sequence to search.

    • Default: START-END (entire sequence)
  • 🗄️ DATABASE RANGE (dbrange): Specify a length range for database sequences to search against.

    • Default: START-END (all lengths)
  • 🔢 MULTI HSPS (hsps): Display all significant High-scoring Segment Pairs (HSPs) between query and library sequence.

    • no (false) - Default
    • yes (true)
  • 📝 ANNOTATION FEATURES (annotfeats): Turn on/off annotation features from UniProtKB.

    • no (false) - Default
    • yes (true)
  • ⚙️ SCORE REPORT FORMAT (scoreformat): Choose the format for the score report.

    • default - Default
    • -m 8 -- blast tabular, -m 8C -- BLAST tabular with comments, etc. (various tabular and ASN.1 formats)

4️⃣ Submit Your Job

  • Once your sequence is entered and parameters are set, click the Submit or Run button.
  • Your job will be dispatched to the EMBL-EBI Web Service. You will be automatically redirected to a Job Status page to monitor its progress.

5️⃣ Interpret Results

  • On the results page, you will find a summary of your FASTA (Protein-Protein) search, including a list of significant hits, their scores, and alignments.
  • Pay attention to the E-value, which indicates the statistical significance of the match. Lower E-values are more significant.
  • ⭐ Tip: Explore the detailed alignments to understand the exact matching regions and any gaps or mismatches. If ANNOTATION FEATURES was enabled, check the "Domain Diagrams" tab for visual insights.

💬 Need Help?

If you run into issues, please visit our Contact Us page for support. Happy FASTA searching!