|
|
Date |
Geogr |
Size |
Content |
Notes |
|
SEU ("index" cards) |
1959- |
|
1m |
1/2 wr, 1/2 sp |
|
| First generation (most in ICAME Collection) (also available from class website) | |||||
|
Brown |
1961 |
US |
1m |
All written |
Indifference/hostility |
|
Brown = (on average) 1/400th the number
of tokens |
|||||
|
LOB |
1961 |
|
1m |
Approx same |
Approx same |
|
FLOB |
1991 |
US/UK |
1m each |
Approx same |
Approx same |
|
Australia |
1978 |
|
1m each |
Approx same |
Approx same (-Western, SF, romance in Kolhapur) |
|
London-Lund |
|
|
500k |
|
From the SEU |
| Second generation "mega corpora" | |||||
|
1980s> |
70% UK |
7.3m 1982 |
25% spoken |
Monitor corpus has morphed into Bank of English "MarkDavies"; ?Z6QZFz? |
|
|
1991-95 |
UK |
100m |
Spreadsheet |
Help from British gov't |
|
|
c2000 > |
US |
~11m |
|
||
|
1990 > |
Many countries |
1m each |
Overview |
|
|
| COCA | 1990-present | US | 425 million | ||
| Sketch Engine | mdavies; verl.oya<ano | ||||
|
Language acquisition CHILDES
(c1985-; ~20m) |
|
-- Need 500m words (surcingle) |
|
Other -- European Corpus Initiative |
|
--
Lancaster/IBM Spoken English Corpus
(SEC) 1984-87; 52k; Linguistic Data Consortium: Membership, collects corpora, used by programmers, speech recognition-transcribed orthographically, phonetically, time stamp. Examples:
|
|
Historical (1500s-1900s) -- -- ARCHER -- Fairly complete listing |
|
|