



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Main points of this exam paper are: Suffix Tree, Permuting Symbols, Determines Whether, Special Symbol, Maximal Repeat, Subtree, Outgoing Edges
Typology: Exams
1 / 6
This page cannot be seen from the preview
Don't miss anything!
CSE3358 Problem Set 9 Solution
Problem 1: Relaxed string matching (20 points) We have seen in class a number of algorithms for the string matching problem where, given a text T [1..n] over some finite alphabet Σ and a pattern P [1..m], we need to determine all positions in T where P occurs (called valid shifts). The Naive algoritms runs in O(mn) time. The Rabin-Karp algorithm runs in O(n + mv) expected time where v is the number of valid shifts. The suffix tree algorithm runs in O(n + m + v) = O(n + m) time. In this problem we consider a relaxed version of the string matching problem that can be solved in linear time.
Let T (^) im = T [i..i + m − 1]. We say T (^) im is an anagram of P if there is a way of permuting symbols of T (^) im so that the resulting array is equal to P.
(a) (10 points) Give an algorithm that, given an index i, determines whether T (^) im is an anagram of P. Your algorithm should be asymptotically as efficient as possible.
ANSWER: Let |Σ| = k. Then initialize an array A[1..k] to contain all zeros, where A[j] will eventually hold the number of times symbol j appears in T (^) im. A can be computed in linear time, just as in counting sort. Go through T (^) im once, and increment the corresponding count in A. Create a similar array B for P and then compare A and B in constant time O(k). So this whole algorithm takes O(m + k) = O(m) time.
(b) (10 points) Give an algorithm that, given T and P , reports all i’s such that T (^) im is an anagram of P. Your algorithm should run in O(n + m) time.
ANSWER: Comnpute A for T 1 m and B for P as before. This takes O(m) time. Now compare the two in O(k) time and determine whether P is anagram of T 1 m. Now we can compute A for T 2 m in constant time. Simply decrease the count for T [1] in A and increase the count for T [m + 1] in A. B remains the same. Then compare A and B again in O(k) time and determine whether P is anagram of T 2 m , and so on... This algorithm will take O(m + nk) = O(m + n) time.
Problem 2: Finding maximal repeats (20 points) Consider a string T [1..n]. Define T [0] and T [n + 1] as a special symbol that does not occur in T. Consider a string P [1..m]. We say P is a maximal repeat of T if ∃ 0 < i, j ≤ n such that:
(a) (5 points) Based on the above definition, explain in plain English what a maximal repeat is.
A maximal repeat P of length m is a string that occurs at least twice in T such that one of the occurrences, say P = T (^) im , cannot be extended to the left or right and still repeat.
(b) (10 points) Consider the suffix tree of T. Show that P is a maximal repeat of T is equivalent to ∃ v in the suffix tree of T such that:
The first direction is trivial:
If P is a maximal repeat, then there must exist two suffixes, say i and j that start with P such that T [i − 1] 6 = T [j − 1] and T [i + m] 6 = T [j + m]. Since there must be two paths in the suffix tree that spell suffix i and suffix j, and no two outgoing edges on a node can start with the same symbol, there must be a node v where the two paths divert for the first time. The path to v must spell P. Moreover, v is definitely not a leaf because it has at least two outgoing edges. In v’s subtree, we must reach leaf i and leaf j.
Now the other direction:
If such a node v exists, then consider leaf i and leaf j in v’s subtree. Let e 1 be the outgoing edge of v that leads to leaf i. Let e 2 be the outgoing edge of v that leads to leaf j. If e 1 6 = e 2 then we are done because we have two suffixes i and j that divert after P and T [i − 1] 6 = T [j − 1]. If e 1 = e 2 , then there must be another edge going out of v because any internal node in the suffix tree must have at least two outgoing edges. Therefore, there must be another leaf k in v’s subtree. Now either T [k − 1] 6 = T [i − 1] or T [k − 1] 6 = T [j − 1] because it cannot be equal to both since T [i − 1] 6 = T [j − 1]. Without loss of generality, let’s say T [k − 1] 6 = T [i − 1]. Then we have leaves i and k, that are reached from v on two different edges, and we are back to the previous case.
(c) (5 points) Show that a string T of length n has O(n) maximal repeats.
ANSWER: Since a maximal repeat corresponds to a node in the suffix tree, and the suffix tree has O(n) nodes, then we have O(n) maximal repeats.
ANSWER: In this case, we can prove that there is always a solution that contains the lightest item. This can be done by a cut and paste argument to justify the greedy choice. Consider the items sorted by their weight. Assume we have an optimal solution that does not contain item 1. Then let i > 1 be the lightest item in the solution. Cut i and paste 1. Definitely, we do not exceed the weight limit because item 1 is not heavier than item i. Also, we still get an optimal solution because v 1 ≥ vi. This argument can be carried further to say that item 2 must be in the solution if the solution has at least two items, etc...
Therefore, in this case a greedy algorithm will do the job. Sort the objects by increasing order of their eeight (i.e. decreasing order of their weight). Then pick objects in order (most valuable first) until you cannot add anymore objects. The running time of this algorithm is dominated by sorting and hence it is O(n log n) which is independent of W.
Problem 4: Largest common circuit (20 points) Define a circuit as a logic consisting of AND, OR, and NOT gates connected together such that:
The problem: We have two circuits as described above, one with m gates and one with n gates, we would like to obtain the largest circuit (in terms of the number of gates) that is a sub-circuit of both.
(a) (5 points) Give a brute force algorithm for finding the largest common circuit of two circuits and analyze its running time.
ANSWER: The first observation that we have to make is that each circuit can be represented by a binary tree where each node will have one of three types: AND, OR, or NOT. Now the problem is given two binary trees, find the largest tree that is subtree of both.
One way is to consider every subtree of T 1 and check whether it is a subtree of T 2 and keep track of the largest so far. But,
Well, too many subtrees, and large time to check! Here’s why: Consider T 1 to be a complete binary tree of height h, then any tree of height h is a subtree of T 1. How many trees of height h can we have? Definitely exponential. Let S(h) be the number of binary trees of height h. S(−1) = 1 (empty tree). S(0) = 1 (one node). S(2) = 3. Also, S(h) ≥ S(h − 1)^2 because a tree of height h can consist of two subtrees of height h − 1. Therefore, S(h) ≥ 32 h
. But h is Θ(log n) for a complete tree; therefore, S(h) ≥ 3 n.
Moreover, given a tree, to check whether it is a subtree of T 2 , we have to place the root at some node and then make sure that there is a match for every node. This is easy if the tree is ordered, but since it is not, at any node, we can map either the left child to the left child and the right child to the right
child or vice-versa. This gives two choices at each level for every node, which yields 2. 4. 8 ... = O(2n 2 ) choices.
PS: if the trees were ordered trees, then it would be much simpler. Just pick a node i from T 1 and a node j from T 2 and perform a tree walk on i’s subtree in T 1 and follow the same steps (if they exist) in T 2 marking those nodes that are common. This would reveal the largest common subtree that maps i and j to the root of the common subtree. This takes linear time. We do this for every i and j, and we keep track of the best one so far. With a total of O(mn) walks (i,j pairs) we end up with a O(mn^2 ) algorithm. Part (b) suggests even a better algorithm.
(b) (15 points) Give a dynamic programming formulation of the problem and design an algorithm (based on dynamic programming) for finding the largest common circuit of two circuits that runs in O(mn) time.
ANSWER: Let c(i, j) be the size of the largest common tree that maps node i from T 1 and node j from T 2 to the root of the common tree. Also let T 1 (i) be the type of node i in T 1 and similarly T 2 (j) be the type of node j in T 2. Then we have an optimal structure such that:
c(i, j) =
0 T 1 (i) 6 = T 2 (j) 1+ max(c(lef t[i], lef t[j]) + c(right[i], right[j]), c(lef t[i], right[j]) + c(right[i], lef t[j])) otherwise
The values for c(i, j) can be recursively obtained in a table of size mn. Then we would pick the entry with maxi,j c(i, j) and backtracking from that entry in the table would give us the largest common tree.
The running time of this algorithm is obviously O(mn) because we have O(mn) entries to compute and each requires constant time.
Problem 5: Spanning trees (20 points)
(a) (10 points) Let wmax be the maximum weight of an edge in a spanning tree. A bottleneck spanning tree is a spanning tree whose wmax is minimum over all spanning trees. Use a “cut and paste” argument to show that a minimum spanning tree is a bottleneck spanning tree.
ANSWER: We can show that the minimum spanning tree minimizes wmax also. To show this we use a cut and paste argument. Consider the minimum spanning tree T with edge e = (u, v) having the maximum weight. Consider another tree T ′^ with edge e′^ having the maximum weight. Let w(e′) < w(e). We will reach a contradiction. First, note that e cannot be in T ′^ because e′^ is the edge with maximum weight in T ′^ and w(e′) < w(e). Cut e = (u, v) from T. This will disconnect u and v. Since e is not in T ′, there must be a path in T ′^ between u and v that does not use e = (u, v). There must be an edge e′′^ on this path that re-connects T. Paste e′′^ in T. Therefore, T − {e′} ∪ {e′′} is a spanning tree with a weight less than that of T because w(e′′) ≤ w(e′) < w(e), a contradiction.