STATISTICS

1. In a corpus, there are 32 tokens of a given word in SPOKEN (which has 1,570,000 words) and 338 tokens in WRITTEN (which has 21,000,000 words). Which section has the most tokens, per million words?

2. The following are the frequencies of words in the FICTION and ACADEMIC sections of the BYU Corpus of American English. Is there a correlation between the two? What is the r value? (Use Excel, Insert Function / Pearson)

  break (v) crack (n) perfectly (r) somewhat (r) rational (j) absurd (j)
FICTION 5184 2074 3893 2438 505 883
ACADEMIC 2405 406 1409 5711 2931 514

3. The following are the test scores from two classes. Is there a significant difference between the two classes? What is the p value? (Use unpaired t-test)

Class 1 28 32 34 37 29 27 20
Class 2 29 30 39 36 22 21 18

4. The following are the number of Utahns and non-Utahns who prefer I'll help who's next vs I'll help whoever's next. Is there a difference? What is the p value? (Use chi square)

pronunciation Utahns Non-Utahns
who 13 18
whoever 28 32