PROJECT: WEB AS CORPUS
 

Please send to Sara White and to me, with "LING 485: Project" in the subject line.


Possible topics. Just do ONE of the following
 

Orthography
Pick a number of words where they are often misspelled, where there is variation in spelling, or there is correct/incorrect usage (see 1 and 2).  How do the differences vary across different domains (e.g. .edu, .com, .gov, .uk, .ca, etc)? Take a brief sampling of the types of pages (company websites, individuals' own homepages, etc) that have the "aberrant" usages/spellings.

Grammar
Choose a "non-standard" grammatical construction and see how it varies across domains:

     I'm like so
     so not interested
     who to talk with / with whom to talk
     had went / had saw

Semantics / collocates
Pick a relatively uncommon word (most of which will still have tens of thousands of hits on the Web). Maybe use WordandPhrase.Info (and don't take words with hyphens). Sample the first 70-80 occurrences and see what words collocate most frequently. How does this compare to COCA or the BNC?

Dialect
Choose a feature (lexical, grammatical, or whatever) with which you know or suspect that there would be a difference from one country to another (e.g. ac.uk vs .edu, or .es vs .mx vs .ar, or .de vs .at, or .pt vs .br).  You can also use "Region" in Google Advanced Search . E.g.:

  candy put ratio per 1,000 put
American 850,000,000 4,410,000,000 .1927 192.7
British 41,600,000 472,000,000 .0881 88.1
        = 218% more than UK