PROJECT: COLLOCATES (Due 3 PM, Thursday, Feb 22)

Please send the assignment to me, with LING 485: Project 6 in the subject line (including the colon and the right capitalization).

Also please name your file in the following way: Project + underscore + uncapitalized lastname, e.g. 6_snodgrass


1. Choose a moderately frequent word from English -- maybe between #2000 and 4000. Also, please number the sections below in your responses.

2. Before you start looking for the word in corpora, write down 4-5 of the most frequent collocates that you think of, and get the same data from two other people (i.e. "what words do you think of when you think of ---?").

3. Search for the word in COCA.

3.1 What are the 7-8 most common collocates to the left? (indicate span)

3.2 What are the 7-8 most common collocates to the right? (indicate span)

4. Which direction (left or right) gives the best results? Why?

5. What are 2-3 words (even beyond the 7-8 listed above) that are a surprise? How are they used with the node word?

6. Are there 2-3 words in the list (might need to go fairly far down) that seem to be "errors". Are the errors of the type "abortion / supreme"  or "rove / presidential"?

7. In the searches above, you didn't limit by part of speech. Completely reset the form, and now re-do the search, limiting by one part of speech that you think might be useful? How does this compare with the results in #3 above?

8. Re-set the form and this time sort by Mutual Information score, with a minimum frequency of 5 or 10.

8.1 What are the top 7-8 collocates now?

8.2 Which search seemed to produce the best results? #3 above (sorting by raw frequency, with MI threshold) or sorting by Mutual Information?

9. Compare the collocates in COCA with those in iWeb. Give 3-4 words that are high frequency, but different in each corpus. Try to explain these differences.


#1-8 above. No need to do #9.

Can get word from Spanish or Portuguese Word and Phrase

ANY OTHER LANGUAGE (use "Word Sketch" at Sketch Engine)

Do #2-8 above, as well as is possible. Some adaptations:

3-4. Use "subject", "object", "modifier", etc

8. Use "Sort by frequency" and "Sort by score"