🔍 FASTA (Nucleotide-Nucleotide): Fast DNA/RNA Similarity Search
FASTA (Nucleotide-Nucleotide) is a widely used bioinformatics tool for rapidly searching a nucleotide (DNA or RNA) query sequence against a nucleotide sequence database. It identifies regions of local similarity between nucleic acid sequences, helping to infer functional or evolutionary relationships.
❓ What is FASTA (Nucleotide-Nucleotide)?
FASTA (when used for nucleotide-nucleotide searches, often denoted as FASTA-N) takes a DNA or RNA query sequence and compares it against a chosen nucleotide sequence database. It uses a heuristic algorithm to quickly find regions of high similarity, providing a list of hits and their alignments. It's a foundational tool for direct nucleic acid sequence comparison.
- Nucleotide Query vs. Nucleotide Database: Compares DNA/RNA to DNA/RNA.
- Local Similarity Search: Finds regions of highest similarity.
- Rapid Homology Detection: Efficiently identifies related nucleic acids.
🎯 Why Use FASTA (Nucleotide-Nucleotide)? For Quick DNA/RNA Homology
FASTA (Nucleotide-Nucleotide) is indispensable for:
- 🔍 Gene Identification: Finding known genes or homologous sequences in a newly sequenced genome or transcript.
- 🧬 Primer/Probe Specificity: Checking the specificity of PCR primers or hybridization probes against a genome or transcript database.
- 📊 Sequence Verification: Confirming the identity of a cloned DNA fragment.
- 🎯 Contamination Detection: Identifying contaminating sequences in your sample.
- 📈 Variant Screening: Quickly screening for known sequence variants.
🧑💻 How to Use FASTA (Nucleotide-Nucleotide) on Job Dispatcher: A Step-by-Step Guide
Follow these simple steps to perform a nucleotide-nucleotide FASTA search:
1️⃣ Navigate to the Tool
- From the main menu, go to All Tools (or search for "FASTA (Nucleotide vs. Nucleotide Search)").
- Click the prominent Use Tool button located next to "FASTA (Nucleotide vs. Nucleotide Search)."
2️⃣ Input Your Nucleotide Sequence
Locate the input box (large text area) or the "upload a Sequence File" option.
Paste your nucleotide (DNA or RNA) sequence(s) in FASTA format or upload a FASTA file.
>my_dna_query ATGGCCATGGCACTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCATGImportant: You can provide a sequence either by typing into the text area OR by uploading a file, but not both simultaneously. Please clear one input to proceed.
3️⃣ Configure Parameters
📝 Title: Provide a descriptive title for your job (e.g., "My FASTA-NN Search").
💡 Program: Select the specific FASTA program to run.
FASTA- DefaultSSEARCH(More sensitive Smith-Waterman)GGSEARCH(Global-Global alignment)GLSEARCH(Global-Local alignment)
🗄️ Databases: Select one or more nucleotide databases to search against.
- Default:
em_est_env,em_gss_env,em_htc_env,em_htg_env,em_pat_env,em_std_env,em_sts_env,em_tsa_env - (Many other options available in the Nucleotide Databases Tree on the form)
- Default:
➕ MATCH/MISMATCH SCORES (
match_scores): Define scores for matches and mismatches.- Default:
+5/-4 - Options:
+5/-4,+3/-2,N/A(none)
- Default:
➖ Gap Open (
gapopen): The penalty for opening a new gap.- Default:
-10 - Options:
default(10),0,-1, ...,-19
- Default:
➖ Gap Extend (
gapext): The penalty for extending an existing gap.- Default:
-2 - Options:
default(10),0,-1, ...,-16
- Default:
📏 KTUP (
ktup): The size of the word (k-tuple) used for initial seeding. Higher values are faster but less sensitive.- Default:
6 - Options:
6,5,4,3,2,1,N/A(-1)
- Default:
📈 EXPECTATION UPPER LIMIT (
expupperlim): Maximum E-value for reported matches. Lower values are stricter.- Default:
10 - Options:
1e-300,1e-100,1e-50,1e-10,1e-5,0.001,0.1,1,2,5,10,20,50
- Default:
📉 EXPECTATION LOWER LIMIT (
explowlim): Minimum E-value for reported matches. Allows excluding very closely related hits.- Default:
0 - Options:
0,1e-300,1e-100,1e-50,1e-10,1e-5,0.001,0.1,1,2,5,10,20,50
- Default:
↔️ STRAND (
strand): For nucleotide sequences, specify the sequence strand to be used for the search.both- Defaulttopbottom
📊 HISTOGRAM (
hist): Display a histogram of scores in the FASTA result.false(no) - Defaulttrue(yes)
🧹 FILTER (
filter): Filter regions of low sequence complexity.none- Defaultdust(DUST filter)
📊 STATISTICAL ESTIMATES (
stats): Method for calculating statistical significance.1(Regress) - Default2(MLE),3(Altshul-Gish),4(Regress/shuf.),5(MLE/shuf.)
🔢 SCORES (
scores): Maximum number of match score summaries to report.- Default:
50 - Options:
10,20,30,40,50,60,70,80,90,100,150,200,250,500,750,1000
- Default:
↔️ ALIGNMENTS (
alignments): Maximum number of alignments to report.- Default:
50 - Options:
10,20,30,40,50,60,70,80,90,100,150,200,250,500,750,1000
- Default:
📏 SEQUENCE RANGE (
seqrange): Specify a range within the query sequence to search.- Default:
START-END(entire sequence)
- Default:
🗄️ DATABASE RANGE (
dbrange): Specify a length range for database sequences to search against.- Default:
START-END(all lengths)
- Default:
🔢 MULTI HSPS (
hsps): Display all significant High-scoring Segment Pairs (HSPs) between query and library sequence.no(false) - Defaultyes(true)
⚙️ SCORE REPORT FORMAT (
scoreformat): Choose the format for the score report.default- Default-m 8 -- blast tabular,-m 8C -- BLAST tabular with comments, etc. (various tabular and ASN.1 formats)
4️⃣ Submit Your Job
- Once your sequence is entered and parameters are set, click the Submit or Run button.
- Your job will be dispatched to the EMBL-EBI Web Service. You will be automatically redirected to a Job Status page to monitor its progress.
5️⃣ Interpret Results
- On the results page, you will find a summary of your FASTA (Nucleotide-Nucleotide) search, including a list of significant hits, their scores, and alignments.
- Pay attention to the E-value, which indicates the statistical significance of the match. Lower E-values are more significant.
- ⭐ Tip: Use the
STRANDoption carefully; searching both strands is common, but specific needs might require 'top' or 'bottom' only.
💬 Need Help?
If you run into issues, please visit our Contact Us page for support. Happy FASTA searching!