Detecting tandem repeats in genomic sequences
In a given set of genomic sequences, detect perfect or imperfect tandem repeats (TRs) which might have evolved away from their anscestral repeated sequence unit - through point mutations and indels, duplications and losses of losses of repeat units and slippage
Tandem repeats (TRs) of sequence regions complicate many downstream genomic analyses, and knowing regions covered by TRs would help to minimize errors in many analyses.
Genomic sequences (protein). Preferred format: fasta
1. Detect potential TRs with different prediction algorithms.
2. Statistically validate all predictions.
3. For each significant prediction, construct a profile HMM based on inferred alignment of TR units.
4. Refine TR prediction using a circular profile HMM.
5. Statistically validate the final prediction
6. Output TR information if significant in step 5.
For each sequence, where TRs were detected, we indicate full information on the TR: starting position, alignments of TR units, and significance values for the TR test and maximum likelihood divergence estimate for TR units (Schaper et al. 2012, NAR).