





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
• A motif is a short sequence element that is repeated, perhaps with variation, multiple times in a collection of sequences. • Typical motif ...
Typology: Lecture notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!
We can talk about motifs from a biological or a computational standpoint.
To understand motif-finding algorithms, it will help to consider a more ab- stract conception of what a “motif” is.
A motif is described by a computational model that specifies how it may vary across its instances. We will consider several important questions for working with motif models:
Let’s focus for a minute on what “interesting” means.
2 A Simple Model of Motif Sequence Variation
To precisely describe motifs, we introduce two formal models: the consensus and the weight matrix model. We consider the consensus model first.
3 Another Model of Motif Sequence Variation
The consensus model, while appealing to people who like combinatorics, is limited in its ability to describe variation in a motif.
To capture these position- and sequence-dependent effects, we introduce the weight matrix model (WMM). For simplicity, we will assume that motifs modeled by a WMM have fixed length – their instances cannot exhibit in- sertions and deletions relative to the model.
is a 4 ×
matrix W of probabilities. The four rows of W are labeled with the four DNA bases, while the columns are labeled 1... `.1 2 3 4 5 6 a 0. 4 0. 1 0. 25 0. 3 0. 1 0. 1 c 0. 1 0. 7 0. 25 0. 2 0. 1 0. 4 g 0. 4 0. 1 0. 25 0. 2 0. 1 0. 1 t 0. 1 0. 1 0. 25 0. 3 0. 7 0. 4
Once again, let’s consider our fundamental questions about motif models.
Pr(acagtc | W ) = 0. 4 × 0. 7 × 0. 25 × 0. 2 × 0. 7 × 0. 4 ≈ 0. 004.
W (c, i) =
total # of instances
How do we measure how interesting a putative motif is in this model?
j=
log
Pr(sj | W ) Pr(sj | B)
Finding the best possible motif in a set of sequences is a computationally hard problem!
5 Using Homologous Sequences in Motif Finding
Conservation can sometimes provide an additional signal to make motif find- ing easier.
The footprinting approach is subject to several challenges.