|
|
Date |
Geogr |
Size |
Content |
Notes |
|
SEU |
1959- |
|
1m |
1/2 wr, 1/2 sp |
|
| First generation (most in ICAME Collection) | |||||
|
Brown |
1961 |
US |
1m |
All written |
Indifference/hostility |
|
LOB |
1961 |
|
1m |
Approx same |
Approx same |
|
FLOB |
1991 |
US/UK |
1m each |
Approx same |
Approx same |
|
Australia |
1978 |
|
1m each |
Approx same |
Approx same (-Western, SF, romance in Kolhapur) |
|
London-Lund |
|
|
500k |
|
From the SEU |
| Second generation "mega corpora" | |||||
|
1980s> |
70% UK |
7.3m 1982 |
25% spoken |
Monitor corpus has morphed into Bank of English |
|
|
1990 > |
Many countries |
1m each |
Overview |
|
|
|
1991-95 |
UK |
100m |
Spreadsheet |
Help from British gov't |
|
|
c2000 > |
US |
~11m |
|
||
|
Language acquisition CHILDES
(c1985-; ~20m) |
|
Lexicographical -- Need 500m words (surcingle) |
|
Other -- European Corpus Initiative |
|
Spoken --
Lancaster/IBM Spoken English Corpus
(SEC) 1984-87; 52k; Linguistic Data Consortium: Membership, collects corpora, used by programmers, speech recognition-transcribed orthographically, phonetically, time stamp. Examples:
|
|
Historical (1500s-1900s) -- -- Fairly complete listing |