HISTORICAL I
Text archives
-
Literature Online
-
Literature Online (LION)
-
Early English Books
Online (Michigan)
-
American Periodicals (1741-1900)
-
New York Times (1851-present)
-
Oldie radio
scripts
-
Movie
scripts
-
Google Books
-
Project Gutenberg
-
TIME Magazine
-
OED
Part of speech (noun, verb)
Lemmatization (forms of go)
Show frequency by period
Small corpora
Large vs
small corpora (d2e
Helsinki 2015)
Recent change: COCA (changes page), NOW
(PPT)
My corpora
-
English
-
OED Corpus
(37m words, Old English - present)
-
TIME Corpus
(100m words, 1923-2006)
-
General
Conference (25 million words, 1851-20010)
-
Early
English Books Online [EEBO]: 755 million words,
1470s-1690s
-
Corpus of Historical American English [COHA]: 400m words, ~1810-2009
-
Corpus of
Contemporary American English [COCA]: 560m words,
1990-present
-
News on the Web
[NOW]: 5.8+ billion words, 2012-yesterday
-
Spanish
-
CORDE
-
Corpus del
Español
-
Portuguese
-
Corpus do
Português
Martin Hilpert's work with
motion charts for COHA (general
motion charts)
What can you do with a real corpus?
-
Overall frequency
-
Words and phrases
-
COHA, TIME (main page)
-
Spanish: soldado, casto
-
Gen Conf:
(main page)
-
Problem: spelling and lemmatization (notwithstanding,
seem, haver)
-
Morphemes/roots
-
-aholic, -gate in TIME
-
[fazer],
[haver] in Portuguese
-
Syntactic constructions
-
end up Ving
-
who/whom
-
going to / will
(COHA)
-
preposition stranding
-
accustomed to V/V-ing
-
split infinitives
-
subjunctive (if I was/were)
-
modals of obligation: should /
must / ought to / need to / have to
-
problem: part of speech tagging
Advanced syntax:
-
relative pronouns ([nn*] [cst*]|[ddq*]|[pnqs*]
he [v*] / [nn*] -- he [v*])
-
pre/post verbal negation (and "do
support") with have
(older: [p*] [have] [x*] [a*]|[d*] [nn*] / newer: [p*] [do] [x*]
[have] [a*]|[d*] [nn*])
Collocates (changes in meaning)
-
Chip
-
Engine
-
Wife
-
Crisis
-
Comparison by time period
-
Verbs in 1930s-40s vs 1990s-2000s
-
*heart* in 1800s vs 1900s (COHA)
-
[ Class looks for others in 1800s /
1900s: OED ]
-
Phrasal verbs
-
Feminine -ess nouns
-
ADJ with
woman in OED, TIME
-
ADJ with
mujer in Spanish (1800s/1900s)
-
ADJ with
mulher in Portuguese (1800s/1900s)
Limitations
-
Time-delay (whom, like)
-
Balance (Spanish, COHA)
-
Spelling
variation: notwithstanding, peas, up (others??) // Pt
lemmatization
Activity
-
Find frequency
of synonyms
-
Historical syntax
Google Books (compare:
COHA, GB, GB-BYU)
COCA
Using Google Books (Science) |