Group assignment

Spring Semester 2006

Reality is not always probable, or likely”.

Nothing is built on stone; all is built on sand, but we must build as if the sand were stone”. (Jorge Luis Borges)

First Installment

Text Frequency analysis


Second Installment

Text Statistics and Probability

  • Download The Lottery of Babylon and La Loteria en Babilonia (Text Documents)
  • Using the frequency distribution of letters of both texts produced in the first installment, calculate the measures of central tendency and dispersion discussed in class, for letter frequency
  • Calculate the probability of a letter being a vowel in both texts
  • Calculate the Probability of a letter being a consonant in both texts
  • Calculate the Conditional probabilities of letters ‘e’ and 'u', given all other letters occurring before them in both texts
    • P(e|♥) where ♥ is the letter occurring before ‘e’
    • P(u|♥) where ♥ is the letter occurring before ‘u’
    • Compute for all letters (not space)
    • Produce histogram of P(e|♥), for all ♥.
    • Produce histogram of P(u|♥), for all ♥.
    • Discuss the independence of ‘e’ and ‘u’ from other letters
    • Tip: use the Textanalyzer to calculate digrams. It will give you the frequency of pairs of letters
  • Upload results to the Group Project 2 assignment folder in Oncourse.
  • Due on April 6th, 2006

Third Installment

Storing the Data Gathered

  • Given any text such as the library of babylon
    • Create a database model and a relational database instance using Microsoft Access to store the data and conclusions from previous installments. Use the entity-relationship model
    • Examples of items that should appear: Title, author, language, publication date Frequency/probability of each letter, Conditional probabilities for letters ‘e’ and ‘u’ (as produced in installment 2), Positively and negatively dependent letters
  • Use at least 4 texts
  • Upload results to the Group Project 3 assignment folder in Oncourse.
  • Due on April 27th , 2007


For more information contact Luis Rocha or Santiago Schnell.
Last Modified: April 19, 2007