QUANTITATIVE METHODS (STATISTICS)
 

McEnery, Tony and Andrew Wilson (2001) Corpus Linguistics. 2nd edition. Edinburgh UP. 2nd edition. 75-101.
  1. Types of data

  • Quantitative (test scores, number of tokens of word)

  • Ordinal (top 50 words in FICT and ACAD)

  • Nominal (M/F, state of origin, ethnic background)

  1. What's the difference between a qualitative and a quantitative analysis of a corpus?
  2. What are two or three basic principles that should be kept in mind as one attempts to create a representative corpus?
  3. Why is it necessary to use proportions? Where have we used them to this point?
  • 120 tokens in 20m words vs. 30 in 4m words

  • 736 tokens in 50m words vs. 367 in 20m words

  1. What info does the chi-square test give?
  spoken written
spoken 81 123
written 70,000,000 70,000,000
  • With chi-square, want p <= .05.

  1. What are the MI (mutual information) and Z-score tests used for?
  • WordCruncher
  • Comparing collocates of different words (corpus.byu.edu corpora)
  1. What shortcomings to the MI and Z-score tests have?  How do multivariate tests help?
  1. Briefly discuss the type of factor analysis that Biber (1993a) deals with.

    Biber/Davies - multi-dimensional analyses