Suffix Tree - Data Structures - Solved Problems | Exams Data Structures and Algorithms

CSE3358 Problem Set 9

Solution

Problem 1: Relaxed string matching (20 points)

We have seen in class a number of algorithms for the string matching problem where, given a text

T[1..n] over some finite alphabet Σ and a pattern P[1..m], we need to determine all positions in T

where Poccurs (called valid shifts). The Naive algoritms runs in O(mn) time. The Rabin-Karp

algorithm runs in O(n+mv) expected time where vis the number of valid shifts. The suffix tree

algorithm runs in O(n+m+v) = O(n+m) time. In this problem we consider a relaxed version of

the string matching problem that can be solved in linear time.

Let Tm

i=T[i..i +m−1]. We say Tm

iis an anagram of Pif there is a way of permuting symbols of

iso that the resulting array is equal to P.

(a) (10 points) Give an algorithm that, given an index i, determines whether Tm

iis an anagram of P.

Your algorithm should be asymptotically as efficient as possible.

ANSWER: Let |Σ|=k. Then initialize an array A[1..k] to contain all zeros, where A[j] will eventually

hold the number of times symbol jappears in Tm

i.Acan be computed in linear time, just as in counting

sort. Go through Tm

ionce, and increment the corresponding count in A. Create a similar array Bfor

Pand then compare Aand Bin constant time O(k). So this whole algorithm takes O(m+k) = O(m)

time.

(b) (10 points) Give an algorithm that, given Tand P, reports all i’s such that Tm

iis an anagram of

P. Your algorithm should run in O(n+m) time.

ANSWER: Comnpute Afor Tm

1and Bfor Pas before. This takes O(m) time. Now compare the

two in O(k) time and determine whether Pis anagram of Tm

1. Now we can compute Afor Tm

2in

constant time. Simply decrease the count for T[1] in Aand increase the count for T[m+ 1] in A.B

remains the same. Then compare Aand Bagain in O(k) time and determine whether Pis anagram

of Tm

2, and so on... This algorithm will take O(m+nk) = O(m+n) time.

Problem 2: Finding maximal repeats (20 points)

Consider a string T[1..n]. Define T[0] and T[n+ 1] as a special symbol that does not occur in T.

Consider a string P[1..m]. We say Pis a maximal repeat of Tif ∃0< i, j ≤nsuch that:

•P=Tm

i=Tm

j,i6=j

•Tm+1

i−16=Tm+1

k∀k

•Tm+1

i6=Tm+1

k∀k

(a) (5 points) Based on the above definition, explain in plain English what a maximal repeat is.

A maximal repeat Pof length mis a string that occurs at least twice in Tsuch that one of the

occurrences, say P=Tm

i, cannot be extended to the left or right and still repeat.

Suffix Tree - Data Structures - Solved Problems, Exams of Data Structures and Algorithms

Related documents

Partial preview of the text

Download Suffix Tree - Data Structures - Solved Problems and more Exams Data Structures and Algorithms in PDF only on Docsity!

ANSWER: