🔍 FASTA (Protein-Protein): Fast Protein Similarity Search
FASTA (Protein-Protein) is a widely used bioinformatics tool for rapidly searching a protein query sequence against a protein sequence database. It's designed to identify homologous sequences and infer functional or evolutionary relationships based on local sequence similarity.
❓ What is FASTA (Protein-Protein)?
FASTA
(when used for protein-protein searches, often denoted as FASTA-P
) takes an amino acid (protein) query sequence and compares it against a chosen protein sequence database. It uses a heuristic algorithm to quickly find regions of high similarity, providing a list of hits and their alignments. It's a foundational tool for sequence comparison.
- Protein Query vs. Protein Database: Compares protein to protein.
- Local Similarity Search: Finds regions of highest similarity.
- Rapid Homology Detection: Efficiently identifies related proteins.
🎯 Why Use FASTA (Protein-Protein)? For Quick Protein Homology
FASTA (Protein-Protein) is indispensable for:
- 🔍 Functional Annotation: Predicting the function of a novel protein by finding homologous proteins with known functions.
- 🧬 Homology Search: Identifying evolutionarily related proteins across different species.
- 📊 Protein Family Classification: Assigning a protein to a known family or superfamily.
- 🎯 Initial Database Screening: Performing a fast preliminary search before more rigorous alignment.
- 📈 Gene Discovery: Identifying new genes by searching translated genomic regions against protein databases.
🧑💻 How to Use FASTA (Protein-Protein) on Job Dispatcher: A Step-by-Step Guide
Follow these simple steps to perform a protein-protein FASTA search:
1️⃣ Navigate to the Tool
- From the main menu, go to All Tools (or search for "FASTA (Protein vs. Protein Search)").
- Click the prominent Use Tool button located next to "FASTA (Protein vs. Protein Search)."
2️⃣ Input Your Protein Sequence
Locate the input box (large text area) or the "upload a Sequence File" option.
Paste your protein sequence(s) in FASTA format or upload a FASTA file.
>my_protein_query MGDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKKATNE
Important: You can provide a sequence either by typing into the text area OR by uploading a file, but not both simultaneously. Please clear one input to proceed.
3️⃣ Configure Parameters
📝 Title: Provide a descriptive title for your job (e.g., "My FASTA-PP Search").
💡 Program: Select the specific FASTA program to run.
FASTA
- DefaultSSEARCH
(More sensitive Smith-Waterman)GGSEARCH
(Global-Global alignment)GLSEARCH
(Global-Local alignment)
🗄️ Databases: Select one or more protein databases to search against.
- Default:
uniprotkb_swissprot
- (Many other options available in the Protein Databases Tree on the form)
- Default:
📊 Matrix (
matrix
): Select the scoring matrix for protein alignments.BL62
(BLOSUM62) - DefaultBL50
(BLOSUM50)BP62
(BLASTP62)BL80
(BLOSUM80)P250
(PAM250),P120
(PAM120)M40
,M20
,M10
(MDM matrices)VT160
,VT120
,VT80
,VT40
,VT20
,VT10
(VTML matrices)
➖ Gap Open (
gapopen
): The penalty for opening a new gap.- Default:
-10
- Options:
default
(10
),0
,-1
, ...,-19
- Default:
➖ Gap Extend (
gapext
): The penalty for extending an existing gap.- Default:
-2
- Options:
default
(10
),0
,-1
, ...,-16
- Default:
📏 KTUP (
ktup
): The size of the word (k-tuple) used for initial seeding. Higher values are faster but less sensitive.- Default:
6
- Options:
6
,5
,4
,3
,2
,1
,N/A
(-1
)
- Default:
📈 EXPECTATION UPPER LIMIT (
expupperlim
): Maximum E-value for reported matches. Lower values are stricter.- Default:
10
- Options:
1e-300
,1e-100
,1e-50
,1e-10
,1e-5
,0.001
,0.1
,1
,2
,5
,10
,20
,50
- Default:
📉 EXPECTATION LOWER LIMIT (
explowlim
): Minimum E-value for reported matches. Allows excluding very closely related hits.- Default:
0
- Options:
0
,1e-300
,1e-100
,1e-50
,1e-10
,1e-5
,0.001
,0.1
,1
,2
,5
,10
,20
,50
- Default:
📊 HISTOGRAM (
hist
): Display a histogram of scores in the results.false
(no) - Defaulttrue
(yes)
🧹 FILTER (
filter
): Filter regions of low sequence complexity.none
- Defaultseg
(SEG filter)xnu
(XNU filter)seg+xnu
📊 STATISTICAL ESTIMATES (
stats
): Method for calculating statistical significance.1
(Regress) - Default2
(MLE),3
(Altshul-Gish),4
(Regress/shuf.),5
(MLE/shuf.)
🔢 SCORES (
scores
): Maximum number of match score summaries to report.- Default:
50
- Options:
10
,20
,30
,40
,50
,60
,70
,80
,90
,100
,150
,200
,250
,500
,750
,1000
- Default:
↔️ ALIGNMENTS (
alignments
): Maximum number of alignments to report.- Default:
50
- Options:
10
,20
,30
,40
,50
,60
,70
,80
,90
,100
,150
,200
,250
,500
,750
,1000
- Default:
📏 SEQUENCE RANGE (
seqrange
): Specify a range within the query sequence to search.- Default:
START-END
(entire sequence)
- Default:
🗄️ DATABASE RANGE (
dbrange
): Specify a length range for database sequences to search against.- Default:
START-END
(all lengths)
- Default:
🔢 MULTI HSPS (
hsps
): Display all significant High-scoring Segment Pairs (HSPs) between query and library sequence.no
(false
) - Defaultyes
(true
)
📝 ANNOTATION FEATURES (
annotfeats
): Turn on/off annotation features from UniProtKB.no
(false
) - Defaultyes
(true
)
⚙️ SCORE REPORT FORMAT (
scoreformat
): Choose the format for the score report.default
- Default-m 8 -- blast tabular
,-m 8C -- BLAST tabular with comments
, etc. (various tabular and ASN.1 formats)
4️⃣ Submit Your Job
- Once your sequence is entered and parameters are set, click the Submit or Run button.
- Your job will be dispatched to the EMBL-EBI Web Service. You will be automatically redirected to a Job Status page to monitor its progress.
5️⃣ Interpret Results
- On the results page, you will find a summary of your FASTA (Protein-Protein) search, including a list of significant hits, their scores, and alignments.
- Pay attention to the E-value, which indicates the statistical significance of the match. Lower E-values are more significant.
- ⭐ Tip: Explore the detailed alignments to understand the exact matching regions and any gaps or mismatches. If
ANNOTATION FEATURES
was enabled, check the "Domain Diagrams" tab for visual insights.
💬 Need Help?
If you run into issues, please visit our Contact Us page for support. Happy FASTA searching!