STRUCTURED CORPORA II - The Sequel

Main idea: Use the Corpus of Contemporary American English (COCA) and the TIME Corpus to see what types of data one can get from a corpus

COCA:

1. Finding idioms
What are the five most common words (*) (in CONTEXT) near all forms of run as a verb ([run].[v*]) (in WORD) (notice there isn't any space between the period and the two "halves" of the query), measured by "Mutual Information" score (set MIN FREQ 1 to "10" and check box, Left/Right = 3/3, and SORT = RELEVANCE (6=amuck, 7=errands)

2. Comparing word meanings
What are the five most common nouns ([nn*]) (in CONTEXT) that occur with quick but not fast, and vice versa? (Use COMPARE WORDS, set MIN FREQ 1 to "10" and check box, Left/Right = 3/3, and SORT = RELEVANCE (fast 6=breaks, 7=modem; quick 6=reference, 7=peek)

TIME:

3. What's decreased, what's increased
What are five verbs ([VVI])that were used a lot in the 1920s-1940s that aren't used so much in the 1980s-2000s (and vice versa) (see MIN FREQ 1 to "5", and check that box) (Earlier 6=whitewash, 7=clamor; later 6=tape, 7=log)

4. When did it peak?
In which decade were each of the following most common:

  • far-out

  • funky

  • beauteous

  • nifty

  • freak out

  • cinemaddict*

  • global warming

  • hippy/hippies

  • political* correct*

  • reds