🔍 FASTA (Nucleotide-Protein): Fast DNA/RNA to Protein Search
FASTA (Nucleotide-Protein) is a bioinformatics tool that compares a nucleotide (DNA or RNA) query sequence against a protein sequence database. It translates the nucleotide query in all six possible reading frames and then searches these translated protein sequences against the database.
❓ What is FASTA (Nucleotide-Protein)?
FASTA
(when used for nucleotide-protein searches, often denoted as FASTX
or FASTY
) takes a DNA or RNA query sequence and translates it into six different protein sequences (three forward frames and three reverse complement frames). It then uses each of these translated protein sequences to search against a chosen protein sequence database. This is particularly useful for identifying potential protein-coding genes in uncharacterized nucleotide sequences.
- Nucleotide Query vs. Protein Database: Translates nucleotide query to protein, then searches protein database.
- Six-Frame Translation: Considers all possible protein products from the nucleotide sequence.
- Gene Identification: Ideal for finding genes in genomic or cDNA sequences.
🎯 Why Use FASTA (Nucleotide-Protein)? For Gene Discovery & Annotation
FASTA (Nucleotide-Protein) is indispensable for:
- 🔍 Gene Discovery: Identifying potential protein-coding genes in novel DNA or RNA sequences (e.g., ESTs, genomic fragments).
- 🧬 Functional Annotation: Inferring the function of a gene by finding homologous proteins in databases.
- 📊 Pseudogene Identification: Helping to distinguish functional genes from non-functional pseudogenes.
- 🎯 Cross-Species Homology: Finding protein homologs from a nucleotide sequence across different organisms.
- 📈 Sequence Validation: Confirming the coding potential of a nucleotide sequence.
🧑💻 How to Use FASTA (Nucleotide-Protein) on Job Dispatcher: A Step-by-Step Guide
Follow these simple steps to perform a nucleotide to protein FASTA search:
1️⃣ Navigate to the Tool
- From the main menu, go to All Tools (or search for "FASTA (Nucleotide vs. Protein Search)").
- Click the prominent Use Tool button located next to "FASTA (Nucleotide vs. Protein Search)."
2️⃣ Input Your Nucleotide Sequence
Locate the input box (large text area) or the "upload a Sequence File" option.
Paste your nucleotide (DNA or RNA) sequence(s) in FASTA format or upload a FASTA file.
>my_dna_query ATGGCCATGGCACTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCATG
Important: You can provide a sequence either by typing into the text area OR by uploading a file, but not both simultaneously. Please clear one input to proceed.
3️⃣ Configure Parameters
📝 Title: Provide a descriptive title for your job (e.g., "My FASTA-PN Search").
💡 Program: Select the specific FASTA program to run.
FASTX
- DefaultFASTY
(More sensitive, accounts for frameshifts)
🗄️ Databases: Select one or more protein databases to search against.
- Default:
uniprotkb_swissprot
- (Many other options available in the Protein Databases Tree on the form)
- Default:
📊 Matrix (
matrix
): Select the scoring matrix for protein alignments (after translation).BL62
(BLOSUM62) - DefaultBL50
(BLOSUM50)BP62
(BLASTP62)BL80
(BLOSUM80)P250
(PAM250),P120
(PAM120)M40
,M20
,M10
(MDM matrices)VT160
,VT120
,VT80
,VT40
,VT20
,VT10
(VTML matrices)
➖ Gap Open (
gapopen
): The penalty for opening a new gap.- Default:
-10
- Options:
default
(10
),0
,-1
, ...,-19
- Default:
➖ Gap Extend (
gapext
): The penalty for extending an existing gap.- Default:
-2
- Options:
default
(10
),0
,-1
, ...,-16
- Default:
📏 KTUP (
ktup
): The size of the word (k-tuple) used for initial seeding. Higher values are faster but less sensitive.- Default:
6
- Options:
6
,5
,4
,3
,2
,1
,N/A
(-1
)
- Default:
📈 EXPECTATION UPPER LIMIT (
expupperlim
): Maximum E-value for reported matches. Lower values are stricter.- Default:
10
- Options:
1e-300
,1e-100
,1e-50
,1e-10
,1e-5
,0.001
,0.1
,1
,2
,5
,10
,20
,50
- Default:
📉 EXPECTATION LOWER LIMIT (
explowlim
): Minimum E-value for reported matches. Allows excluding very closely related hits.- Default:
0
- Options:
0
,1e-300
,1e-100
,1e-50
,1e-10
,1e-5
,0.001
,0.1
,1
,2
,5
,10
,20
,50
- Default:
↔️ STRAND (
strand
): For nucleotide queries, specify the sequence strand to be used for the search.both
- Defaulttop
bottom
📊 HISTOGRAM (
hist
): Display a histogram of scores in the FASTA result.false
(no) - Defaulttrue
(yes)
🧹 FILTER (
filter
): Filter regions of low sequence complexity.none
- Defaultdust
(DUST filter)
📊 STATISTICAL ESTIMATES (
stats
): Method for calculating statistical significance.1
(Regress) - Default2
(MLE),3
(Altshul-Gish),4
(Regress/shuf.),5
(MLE/shuf.)
🔢 SCORES (
scores
): Maximum number of match score summaries to report.- Default:
50
- Options:
10
,20
,30
,40
,50
,60
,70
,80
,90
,100
,150
,200
,250
,500
,750
,1000
- Default:
↔️ ALIGNMENTS (
alignments
): Maximum number of alignments to report.- Default:
50
- Options:
10
,20
,30
,40
,50
,60
,70
,80
,90
,100
,150
,200
,250
,500
,750
,1000
- Default:
📏 SEQUENCE RANGE (
seqrange
): Specify a range within the query sequence to search.- Default:
START-END
(entire sequence)
- Default:
🗄️ DATABASE RANGE (
dbrange
): Specify a length range for database sequences to search against.- Default:
START-END
(all lengths)
- Default:
🔢 MULTI HSPS (
hsps
): Display all significant High-scoring Segment Pairs (HSPs) between query and library sequence.no
(false
) - Defaultyes
(true
)
📝 ANNOTATION FEATURES (
annotfeats
): Turn on/off annotation features from UniProtKB.no
(false
) - Defaultyes
(true
)
⚙️ SCORE REPORT FORMAT (
scoreformat
): Choose the format for the score report.default
- Default-m 8 -- blast tabular
,-m 8C -- BLAST tabular with comments
, etc. (various tabular and ASN.1 formats)
📚 TRANSLATION TABLE (
transltable
): Select the genetic code table for translating the query sequence.- Default:
1
(Standard SGC0) - Options:
N/A
(-1
),1
(Standard SGC0),2
(Vertebrate Mitochondrial),3
(Yeast Mitochondrial), ...,23
(Thraustochytrium Mitochondrial)
- Default:
4️⃣ Submit Your Job
- Once your sequence is entered and parameters are set, click the Submit or Run button.
- Your job will be dispatched to the EMBL-EBI Web Service. You will be automatically redirected to a Job Status page to monitor its progress.
5️⃣ Interpret Results
- On the results page, you will find a summary of your FASTA (Nucleotide-Protein) search, including a list of significant hits, their scores, and alignments.
- Pay attention to the E-value and the specific reading frame from which the best match originated.
- ⭐ Tip:
FASTY
is more sensitive thanFASTX
for finding distant protein homologs from nucleotide sequences, especially if frameshifts are expected.
💬 Need Help?
If you run into issues, please visit our Contact Us page for support. Happy FASTA searching!