STRUCTURED CORPORA (CHAP 19)

  1. Examples of corpora
    1. British National Corpus
    2. TIME Corpus of American English
    3. Corpus del Espanol
  2. What is a corpus?
    1. Daily Universe online??
    2. Printed book??
    3. Letters of Abraham Lincoln??
    4. General Conference??
    5. The Web (via Google)??
  3. General methodology
    1. Get out what is put in: textual
    2. Get out what is put in: interface
    3. Different corpora for different purposes
  4. Possible uses of corpora
    1. Linguistic variation (words, phrases, syntax in different registers)
    2. Historical change (words entering/leaving languages, differences between centuries)
    3. Stylistic variation (e.g. NY Times vs Washington Times, different General Authorities)
    4. Frequency information (top x words for frequency dictionary, etc)
  5. Creating your own corpus
    1. Web-based materials can sometimes be done quickly
    2. Often, though, quite time-consuming (especially spoken)
    3. Copyright issues
  6. Corpora of English
    1. Brown Corpus / LOB (1960s) - 1 million words (glistening, knob)
    2. International Corpus of English (ICE) - 1980s to present
    3. British National Corpus (BNC) / Cobuild (1980s-90s) - hundreds of millions of words
    4. Specialized, like the International Corpus of Learner English (ICLE)
  7. Types of procedures
    1. Wordlists
    2. Concordances
    3. Collocations
    4. Keyword lists
  8. Next time -- the Wild World Wide Web (via Google)