LEXICOGRAPHICAL STUDY

Note: the following project deals with English.  If you'd like to do something similar for another language, please let me know.


For this project, you'll look at the definition for a relatively common word and see whether corpus data could help to improve the definition.  Here's how we'll do it:

1. Identify a noun (NN1) or verb (VVI) with a frequency of between 5000 and 10000 in the American Corpus.

2. Next, look up the word from the American Heritage Dictionary online.

3. See how well the definition agrees with the actual corpus data:

A) Are the any meanings of the word that are found in the corpus, but not in the dictionary?

B) Conversely, are there any meanings that are found in the dictionary, but not in the seven million word corpus?

C) Are the main idiomatic expressions (phraseology) accounted for in the definition?

D) Does the order of meanings given in the dictionary roughly follow the frequency of occurrences for different meanings in the corpus -- both for part of speech (e.g. account as N and V), as well within a given part of speech (e.g. account = "bank account", or "the account that he gave...")?


Here's some practice words, which we'll do in class:

ring, kill, mark, treat, tie, cast, strike