Lab 5: Perl and Regular Expressions - Finding Anagrams | Lab Reports Bioinformatics

BIF 101 Lab 5 – October 2, 2008 – more on Perl and Regular Expressions

Assigned: Thursday October 2

Due: Thursday October 9

Description:

You will use another Perl program that was included with the files you downloaded for Lab 4. It

is called anagram.pl, and you will alter it in a manner similar to the way you altered the regex.pl

program.

An anagram is a word formed from letters from another word. The anagram.pl program finds all

anagrams for a given word in the big_english.txt file and lists them out. For example, anagrams

for the word “from” include “from”, “form”, “for”, and “or”.

In order to find the anagrams of a word there are two steps. First, all the words that contain the

same letters must be selected. A pattern to do that is /^[yourword]*$/, where yourword is

replaced by the word you want to play with. This pattern will select not only anagrams of your

word but will allow for the letters to be repeated, so it is not exactly what is needed. For

example, /^[from]*$/ matches “off”, and there aren’t two “f”s in from. What this pattern ensures

is simply that each letter in a word that is matched is found in the original word, it does not say

how many times.

To cull out the unwanted words found by the first pattern there is a second pattern that describes

what is NOT wanted. This is a bit harder to understand, particularly for complex examples. If

your starting word has no repeated letters it is fairly easy. You just need a pattern that matches a

word with ANY repeated letters – which is exactly what you do NOT want. A pattern that

matches a word with at least one repeated letter is /(.).*\1/

The (.) matches a single character, .* matches one or more of any characters, finally, the \1 holds

the value of the character that was initially matched.

To Do:

Carefully read the rest of chapter 4, starting on page 70.

Write your answers to exercises a-e in the Going Back for More section at the top of page 72 in a

Word document. Note that question e is purely speculative, you will not write any code. For the

other parts (a-d) show the two regular expressions you developed, describe the output and

include a couple sample of output the program produced. In each case say whether or not the

output was what you expected. If not, what do you think was wrong?

For sections 4.10 and 4.11, read carefully and type in the programs on pages 74 and 75. Make

sure to augment the code in the book with comments, #! line, use strict and warnings lines. Copy

and paste your code into your Word document. Run these programs and copy and paste the

output in to your Word document. Explain what the program in each case does and explain how

it would word on a different example – Gallus gallus. Either type this into your two programs

and copy/paste the output here or describe what you would expect in terms of results. Find out

what creature Gallus gallus is and write a sentence about it at the end of your report.

Partial preview of the text

Download Lab 5: Perl and Regular Expressions - Finding Anagrams and more Lab Reports Bioinformatics in PDF only on Docsity!

BIF 101 Lab 5 – October 2, 2008 – more on Perl and Regular Expressions

Assigned: Thursday October 2 Due: Thursday October 9

Description:

You will use another Perl program that was included with the files you downloaded for Lab 4. It is called anagram.pl, and you will alter it in a manner similar to the way you altered the regex.pl program.

An anagram is a word formed from letters from another word. The anagram.pl program finds all anagrams for a given word in the big_english.txt file and lists them out. For example, anagrams for the word “from” include “from”, “form”, “for”, and “or”.

In order to find the anagrams of a word there are two steps. First, all the words that contain the same letters must be selected. A pattern to do that is /^[ yourword ]$/, where yourword is replaced by the word you want to play with. This pattern will select not only anagrams of your word but will allow for the letters to be repeated, so it is not exactly what is needed. For example, /^[from]$/ matches “off”, and there aren’t two “f”s in from. What this pattern ensures is simply that each letter in a word that is matched is found in the original word, it does not say how many times.

To cull out the unwanted words found by the first pattern there is a second pattern that describes what is NOT wanted. This is a bit harder to understand, particularly for complex examples. If your starting word has no repeated letters it is fairly easy. You just need a pattern that matches a word with ANY repeated letters – which is exactly what you do NOT want. A pattern that matches a word with at least one repeated letter is /(.).*\1/

The (.) matches a single character, .* matches one or more of any characters, finally, the \1 holds the value of the character that was initially matched.

To Do:

Carefully read the rest of chapter 4, starting on page 70.

Write your answers to exercises a-e in the Going Back for More section at the top of page 72 in a Word document. Note that question e is purely speculative, you will not write any code. For the other parts (a-d) show the two regular expressions you developed, describe the output and include a couple sample of output the program produced. In each case say whether or not the output was what you expected. If not, what do you think was wrong?

For sections 4.10 and 4.11, read carefully and type in the programs on pages 74 and 75. Make sure to augment the code in the book with comments, #! line, use strict and warnings lines. Copy and paste your code into your Word document. Run these programs and copy and paste the output in to your Word document. Explain what the program in each case does and explain how it would word on a different example – Gallus gallus. Either type this into your two programs and copy/paste the output here or describe what you would expect in terms of results. Find out what creature Gallus gallus is and write a sentence about it at the end of your report.

Lab 5: Perl and Regular Expressions - Finding Anagrams, Lab Reports of Bioinformatics

Related documents

Partial preview of the text

Download Lab 5: Perl and Regular Expressions - Finding Anagrams and more Lab Reports Bioinformatics in PDF only on Docsity!