🔍 FASTA (Protein-Nucleotide): Fast Protein to DNA/RNA Search
FASTA (Protein-Nucleotide) is a bioinformatics tool that compares a protein query sequence against a nucleotide (DNA or RNA) sequence database. It translates the nucleotide database in all six possible reading frames and then searches your protein query against these translated sequences.
❓ What is FASTA (Protein-Nucleotide)?
FASTA (when used for protein-nucleotide searches, often denoted as TFASTX or TFASTY) takes an amino acid (protein) query sequence and searches it against a chosen nucleotide sequence database. Before the search, every sequence in the database is translated in all six possible reading frames. This allows you to find potential coding regions in DNA/RNA databases that are homologous to your protein query.
- Protein Query vs. Translated Nucleotide Database: Compares protein to a six-frame translated nucleotide database.
- Gene Finding: Excellent for identifying potential genes in raw genomic or EST data.
- Versatile Search: Can find protein homologs even if the nucleotide sequence is unannotated.
🎯 Why Use FASTA (Protein-Nucleotide)? For Gene Discovery with Protein Evidence
FASTA (Protein-Nucleotide) is indispensable for:
- 🔍 Gene Discovery: Identifying novel genes or coding regions in uncharacterized nucleotide sequences (e.g., newly sequenced genomes, ESTs, transcriptomes) using a known protein as a guide.
- 🧬 Pseudogene Identification: Helping to distinguish functional genes from non-functional pseudogenes by comparing protein homology.
- 📊 Functional Annotation: Inferring the function of a genomic region by finding homologous proteins.
- 🎯 Cross-Species Homology: Finding protein homologs from a protein sequence across nucleotide databases of different organisms.
- 📈 Sequence Validation: Confirming the coding potential of a nucleotide sequence based on protein similarity.
🧑💻 How to Use FASTA (Protein-Nucleotide) on Job Dispatcher: A Step-by-Step Guide
Follow these simple steps to perform a protein to translated nucleotide FASTA search:
1️⃣ Navigate to the Tool
- From the main menu, go to All Tools (or search for "FASTA (Protein vs. Nucleotide Search)").
- Click the prominent Use Tool button located next to "FASTA (Protein vs. Nucleotide Search)."
2️⃣ Input Your Protein Sequence
Locate the input box (large text area) or the "upload a Sequence File" option.
Paste your protein sequence(s) in FASTA format or upload a FASTA file.
>my_protein_query MGDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKKATNEImportant: You can provide a sequence either by typing into the text area OR by uploading a file, but not both simultaneously. Please clear one input to proceed.
3️⃣ Configure Parameters
📝 Title: Provide a descriptive title for your job (e.g., "My FASTA-NP Search").
💡 Program: Select the specific FASTA program to run.
TFASTX- DefaultTFASTY(More sensitive, accounts for frameshifts)
🗄️ Databases: Select one or more nucleotide databases to search against.
- Default:
em_est_env,em_gss_env,em_htc_env,em_htg_env,em_pat_env,em_std_env,em_sts_env,em_tsa_env - (Many other options available in the Nucleotide Databases Tree on the form)
- Default:
📊 Matrix (
matrix): Select the scoring matrix for protein alignments.BL62(BLOSUM62) - DefaultBL50(BLOSUM50)BP62(BLASTP62)BL80(BLOSUM80)P250(PAM250),P120(PAM120)M40,M20,M10(MDM matrices)VT160,VT120,VT80,VT40,VT20,VT10(VTML matrices)
➖ Gap Open (
gapopen): The penalty for opening a new gap.- Default:
-10 - Options:
default(10),0,-1, ...,-19
- Default:
➖ Gap Extend (
gapext): The penalty for extending an existing gap.- Default:
-2 - Options:
default(10),0,-1, ...,-16
- Default:
📏 KTUP (
ktup): The size of the word (k-tuple) used for initial seeding. Higher values are faster but less sensitive.- Default:
6 - Options:
6,5,4,3,2,1,N/A(-1)
- Default:
📈 EXPECTATION UPPER LIMIT (
expupperlim): Maximum E-value for reported matches. Lower values are stricter.- Default:
10 - Options:
1e-300,1e-100,1e-50,1e-10,1e-5,0.001,0.1,1,2,5,10,20,50
- Default:
📉 EXPECTATION LOWER LIMIT (
explowlim): Minimum E-value for reported matches. Allows excluding very closely related hits.- Default:
0 - Options:
0,1e-300,1e-100,1e-50,1e-10,1e-5,0.001,0.1,1,2,5,10,20,50
- Default:
↔️ STRAND (
strand): For nucleotide database sequences, specify the sequence strand to be translated and used for the search.both- Defaulttopbottom
📊 HISTOGRAM (
hist): Display a histogram of scores in the FASTA result.false(no) - Defaulttrue(yes)
🧹 FILTER (
filter): Filter regions of low sequence complexity.none- Defaultdust(DUST filter)
📊 STATISTICAL ESTIMATES (
stats): Method for calculating statistical significance.1(Regress) - Default2(MLE),3(Altshul-Gish),4(Regress/shuf.),5(MLE/shuf.)
🔢 SCORES (
scores): Maximum number of match score summaries to report.- Default:
50 - Options:
10,20,30,40,50,60,70,80,90,100,150,200,250,500,750,1000
- Default:
↔️ ALIGNMENTS (
alignments): Maximum number of alignments to report.- Default:
50 - Options:
10,20,30,40,50,60,70,80,90,100,150,200,250,500,750,1000
- Default:
📏 SEQUENCE RANGE (
seqrange): Specify a range within the query sequence to search.- Default:
START-END(entire sequence)
- Default:
🗄️ DATABASE RANGE (
dbrange): Specify a length range for database sequences to search against.- Default:
START-END(all lengths)
- Default:
🔢 MULTI HSPS (
hsps): Display all significant High-scoring Segment Pairs (HSPs) between query and library sequence.no(false) - Defaultyes(true)
📝 ANNOTATION FEATURES (
annotfeats): Turn on/off annotation features from UniProtKB.no(false) - Defaultyes(true)
⚙️ SCORE REPORT FORMAT (
scoreformat): Choose the format for the score report.default- Default-m 8 -- blast tabular,-m 8C -- BLAST tabular with comments, etc. (various tabular and ASN.1 formats)
📚 TRANSLATION TABLE (
transltable): Select the genetic code table for translating the database sequences.- Default:
1(Standard SGC0) - Options:
N/A(-1),1(Standard SGC0),2(Vertebrate Mitochondrial),3(Yeast Mitochondrial), ...,23(Thraustochytrium Mitochondrial)
- Default:
4️⃣ Submit Your Job
- Once your sequence is entered and parameters are set, click the Submit or Run button.
- Your job will be dispatched to the EMBL-EBI Web Service. You will be automatically redirected to a Job Status page to monitor its progress.
5️⃣ Interpret Results
- On the results page, you will find a summary of your FASTA (Protein-Nucleotide) search, including a list of significant hits, their scores, and alignments.
- Pay attention to the E-value and the specific reading frame in the database sequence that best matches your protein query.
- ⭐ Tip:
TFASTYis more sensitive thanTFASTXfor finding distant protein homologs in nucleotide sequences, especially if frameshifts are expected.
💬 Need Help?
If you run into issues, please visit our Contact Us page for support. Happy FASTA searching!