CORPORA -- STRUCTURED
 


Lee, David (2010) "What corpora are available?". In Routledge Handbook of Corpus Linguistics, p 107-22. (Note: somewhat biased; very BNC-centric)

Corpora, Collections, Data Archives (mainly for English) (David Lee)


General points:

  • accessibility
  • copyright (e.g. OUP / BNC)

1-2 main corpora from each of:

  • general
  • speech
  • parsed
  • historical
  • Web as Corpus
  • learner
  • parallel
  • non-English (2-3 languages mentioned?)

Pay special attention to (if there):

  • Brown / FROWN / LOB / FLOB
  • Australian Corpus of English (ACE) / Wellington corpus / Kolhapur corpus
  • London-Lund
  • British National Corpus (BNC)
  • American National Corpus (ANC)
  • Bank of English / Cobuild
  • International Corpus of English (ICE)
  • Switchboard
  • Helsinki
  • CHILDES
  • MICASE
  • ICLE - International Corpus of Learners' English