Group assignment
Spring Semester 2006
“Reality is not always probable, or likely”.
“Nothing is built on stone; all is built on sand, but we must build as if the sand were stone”. (Jorge Luis Borges)
First Installment
Text Frequency analysis
- Download The Lottery of Babylon and La Loteria en Babilonia (Text Documents)
- Produce the frequency distribution of letters in both texts
- Produce the relative frequency distribution of letters in both texts
- Produce the cumulative relative frequency distribution of letters in both texts
- Upload results to the Group Project 1 assignment folder in Oncourse.
- Due on March 9th, 2007
Second Installment
Text Statistics and Probability
- Download The Lottery of Babylon and La Loteria en Babilonia (Text Documents)
- Using the frequency distribution of letters of both texts produced in the first installment, calculate the measures of central tendency and dispersion discussed in class, for letter frequency
- Calculate the probability of a letter being a vowel in both texts
- Calculate the Probability of a letter being a consonant in both texts
- Calculate the Conditional probabilities of letters ‘e’ and 'u', given all other letters occurring before them in both texts
- P(e|♥) where ♥ is the letter occurring before ‘e’
- P(u|♥) where ♥ is the letter occurring before ‘u’
- Compute for all letters (not space)
- Produce histogram of P(e|♥), for all ♥.
- Produce histogram of P(u|♥), for all ♥.
- Discuss the independence of ‘e’ and ‘u’ from other letters
- Tip: use the Textanalyzer to calculate digrams. It will give you the frequency of pairs of letters
- Upload results to the Group Project 2 assignment folder in Oncourse.
- Due on April 6th, 2006
Third Installment
Storing the Data Gathered
- Given any text such as the library of babylon
- Create a database model and a relational database instance using Microsoft Access to store the data and conclusions from previous installments. Use the entity-relationship model
- Examples of items that should appear: Title, author, language, publication date
Frequency/probability of each letter, Conditional probabilities for letters ‘e’ and ‘u’ (as produced in installment 2), Positively and negatively dependent letters
- Use at least 4 texts
- Upload results to the Group Project 3 assignment folder in Oncourse.
- Due on April 27th , 2007
“Reality is not always probable, or likely”.
“Nothing is built on stone; all is built on sand, but we must build as if the sand were stone”. (Jorge Luis Borges)
Text Frequency analysis
- Download The Lottery of Babylon and La Loteria en Babilonia (Text Documents)
- Produce the frequency distribution of letters in both texts
- Produce the relative frequency distribution of letters in both texts
- Produce the cumulative relative frequency distribution of letters in both texts
- Upload results to the Group Project 1 assignment folder in Oncourse.
- Due on March 9th, 2007
Second Installment
Text Statistics and Probability
- Download The Lottery of Babylon and La Loteria en Babilonia (Text Documents)
- Using the frequency distribution of letters of both texts produced in the first installment, calculate the measures of central tendency and dispersion discussed in class, for letter frequency
- Calculate the probability of a letter being a vowel in both texts
- Calculate the Probability of a letter being a consonant in both texts
- Calculate the Conditional probabilities of letters ‘e’ and 'u', given all other letters occurring before them in both texts
- P(e|♥) where ♥ is the letter occurring before ‘e’
- P(u|♥) where ♥ is the letter occurring before ‘u’
- Compute for all letters (not space)
- Produce histogram of P(e|♥), for all ♥.
- Produce histogram of P(u|♥), for all ♥.
- Discuss the independence of ‘e’ and ‘u’ from other letters
- Tip: use the Textanalyzer to calculate digrams. It will give you the frequency of pairs of letters
- Upload results to the Group Project 2 assignment folder in Oncourse.
- Due on April 6th, 2006
Third Installment
Storing the Data Gathered
- Given any text such as the library of babylon
- Create a database model and a relational database instance using Microsoft Access to store the data and conclusions from previous installments. Use the entity-relationship model
- Examples of items that should appear: Title, author, language, publication date Frequency/probability of each letter, Conditional probabilities for letters ‘e’ and ‘u’ (as produced in installment 2), Positively and negatively dependent letters
- Use at least 4 texts
- Upload results to the Group Project 3 assignment folder in Oncourse.
- Due on April 27th , 2007

