PROJECT: COLLOCATES (Due
3 PM, Thursday, Feb 22)
send the assignment to me,
with LING 485: Project
6 in the subject line
(including the colon and the right capitalization).
Also please name your file in the following way: Project
+ underscore + uncapitalized lastname, e.g.
1. Choose a moderately frequent
word from English --
maybe between #2000 and 4000. Also, please number the
sections below in your responses.
2. Before you start looking for the word in
corpora, write down 4-5 of the most frequent collocates that you think of,
and get the same data from two other people (i.e. "what words do you
think of when you think of ---?").
3. Search for the word in
3.1 What are
the 7-8 most common collocates to the left? (indicate span)
3.2 What are
the 7-8 most common collocates to the right? (indicate span)
4. Which direction
(left or right) gives the best results? Why?
5. What are 2-3
words (even beyond the 7-8 listed above) that are a surprise? How
are they used with the node word?
6. Are there 2-3
words in the list (might need to go fairly far down) that seem to be
"errors". Are the errors of the type "abortion / supreme" or
"rove / presidential"?
7. In the searches
above, you didn't limit by part of speech. Completely reset the
form, and now re-do the search, limiting by one part of speech that
you think might be useful? How does this compare with the results in
8. Re-set the form
and this time sort by Mutual Information score, with a minimum
frequency of 5 or 10.
8.1 What are
the top 7-8 collocates now?
search seemed to produce the best results? #3 above (sorting by
raw frequency, with MI threshold) or sorting by Mutual
9. Compare the collocates in
COCA with those in iWeb. Give 3-4 words that are high
frequency, but different in each corpus. Try to explain
#1-8 above. No need to do #9.
Can get word from
Portuguese Word and Phrase
ANY OTHER LANGUAGE (use
"Word Sketch" at
Do #2-8 above, as well as is
possible. Some adaptations:
3-4. Use "subject", "object",
8. Use "Sort by frequency" and
"Sort by score"