Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Lab 5: Perl and Regular Expressions - Finding Anagrams, Lab Reports of Bioinformatics

The fifth lab assignment for bif 101, focusing on perl and regular expressions. Students are required to use the anagram.pl program to find anagrams of a given word from the big_english.txt file. How to use regular expressions to select words containing the same letters as the given word and then filter out unwanted words. Students are expected to complete exercises related to regular expressions and write their answers in a word document.

Typology: Lab Reports

Pre 2010

Uploaded on 08/18/2009

koofers-user-a6i
koofers-user-a6i 🇺🇸

10 documents

1 / 1

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
BIF 101 Lab 5 – October 2, 2008 – more on Perl and Regular Expressions
Assigned: Thursday October 2
Due: Thursday October 9
Description:
You will use another Perl program that was included with the files you downloaded for Lab 4. It
is called anagram.pl, and you will alter it in a manner similar to the way you altered the regex.pl
program.
An anagram is a word formed from letters from another word. The anagram.pl program finds all
anagrams for a given word in the big_english.txt file and lists them out. For example, anagrams
for the word “from” include “from”, “form”, “for”, and “or”.
In order to find the anagrams of a word there are two steps. First, all the words that contain the
same letters must be selected. A pattern to do that is /^[yourword]*$/, where yourword is
replaced by the word you want to play with. This pattern will select not only anagrams of your
word but will allow for the letters to be repeated, so it is not exactly what is needed. For
example, /^[from]*$/ matches “off”, and there aren’t two “f”s in from. What this pattern ensures
is simply that each letter in a word that is matched is found in the original word, it does not say
how many times.
To cull out the unwanted words found by the first pattern there is a second pattern that describes
what is NOT wanted. This is a bit harder to understand, particularly for complex examples. If
your starting word has no repeated letters it is fairly easy. You just need a pattern that matches a
word with ANY repeated letters – which is exactly what you do NOT want. A pattern that
matches a word with at least one repeated letter is /(.).*\1/
The (.) matches a single character, .* matches one or more of any characters, finally, the \1 holds
the value of the character that was initially matched.
To Do:
Carefully read the rest of chapter 4, starting on page 70.
Write your answers to exercises a-e in the Going Back for More section at the top of page 72 in a
Word document. Note that question e is purely speculative, you will not write any code. For the
other parts (a-d) show the two regular expressions you developed, describe the output and
include a couple sample of output the program produced. In each case say whether or not the
output was what you expected. If not, what do you think was wrong?
For sections 4.10 and 4.11, read carefully and type in the programs on pages 74 and 75. Make
sure to augment the code in the book with comments, #! line, use strict and warnings lines. Copy
and paste your code into your Word document. Run these programs and copy and paste the
output in to your Word document. Explain what the program in each case does and explain how
it would word on a different example – Gallus gallus. Either type this into your two programs
and copy/paste the output here or describe what you would expect in terms of results. Find out
what creature Gallus gallus is and write a sentence about it at the end of your report.

Partial preview of the text

Download Lab 5: Perl and Regular Expressions - Finding Anagrams and more Lab Reports Bioinformatics in PDF only on Docsity!

BIF 101 Lab 5 – October 2, 2008 – more on Perl and Regular Expressions

Assigned: Thursday October 2 Due: Thursday October 9

Description:

You will use another Perl program that was included with the files you downloaded for Lab 4. It is called anagram.pl, and you will alter it in a manner similar to the way you altered the regex.pl program.

An anagram is a word formed from letters from another word. The anagram.pl program finds all anagrams for a given word in the big_english.txt file and lists them out. For example, anagrams for the word “from” include “from”, “form”, “for”, and “or”.

In order to find the anagrams of a word there are two steps. First, all the words that contain the same letters must be selected. A pattern to do that is /^[ yourword ]$/, where yourword is replaced by the word you want to play with. This pattern will select not only anagrams of your word but will allow for the letters to be repeated, so it is not exactly what is needed. For example, /^[from]$/ matches “off”, and there aren’t two “f”s in from. What this pattern ensures is simply that each letter in a word that is matched is found in the original word, it does not say how many times.

To cull out the unwanted words found by the first pattern there is a second pattern that describes what is NOT wanted. This is a bit harder to understand, particularly for complex examples. If your starting word has no repeated letters it is fairly easy. You just need a pattern that matches a word with ANY repeated letters – which is exactly what you do NOT want. A pattern that matches a word with at least one repeated letter is /(.).*\1/

The (.) matches a single character, .* matches one or more of any characters, finally, the \1 holds the value of the character that was initially matched.

To Do:

Carefully read the rest of chapter 4, starting on page 70.

Write your answers to exercises a-e in the Going Back for More section at the top of page 72 in a Word document. Note that question e is purely speculative, you will not write any code. For the other parts (a-d) show the two regular expressions you developed, describe the output and include a couple sample of output the program produced. In each case say whether or not the output was what you expected. If not, what do you think was wrong?

For sections 4.10 and 4.11, read carefully and type in the programs on pages 74 and 75. Make sure to augment the code in the book with comments, #! line, use strict and warnings lines. Copy and paste your code into your Word document. Run these programs and copy and paste the output in to your Word document. Explain what the program in each case does and explain how it would word on a different example – Gallus gallus. Either type this into your two programs and copy/paste the output here or describe what you would expect in terms of results. Find out what creature Gallus gallus is and write a sentence about it at the end of your report.