|
STRUCTURED CORPORA (CHAP 19)
- Examples of corpora
- British National Corpus
- TIME Corpus of
American English
- Corpus del Espanol
- What is a corpus?
- Daily Universe online??
- Printed book??
- Letters of Abraham Lincoln??
- General Conference??
- The Web (via Google)??
- General methodology
- Get out what is put in: textual
- Get out what is put in: interface
- Different corpora for different purposes
- Possible uses of corpora
- Linguistic variation (words,
phrases, syntax in different registers)
- Historical change (words
entering/leaving languages, differences between centuries)
- Stylistic variation (e.g. NY Times vs Washington Times, different General Authorities)
- Frequency information (top x words
for frequency dictionary, etc)
- Creating your own corpus
- Web-based materials can sometimes
be done quickly
- Often, though, quite
time-consuming (especially spoken)
- Copyright issues
-
Corpora of English
- Brown Corpus / LOB (1960s) - 1
million words (glistening, knob)
- International Corpus of English
(ICE) - 1980s to present
- British National Corpus (BNC) /
Cobuild (1980s-90s) - hundreds of millions of words
- Specialized, like the International Corpus of Learner
English (ICLE)
- Types of procedures
- Wordlists
- Concordances
- Collocations
- Keyword lists
- Next time -- the Wild World Wide Web (via Google)
|