PROJECT: WEB AS CORPUS
 

Possible topics:
 

Orthography
Pick a number of words where they are often misspelled, where there is variation in spelling, or there is correct/incorrect usage (see 1 and 2).  How do the differences vary across different domains (e.g. .edu, .com, .gov, .uk, .ca, etc)? Take a brief sampling of the types of pages (company websites, individuals' own homepages, etc) that have the "aberrant" usages/spellings.

Grammar
Choose a "non-standard" grammatical construction and see how it varies across domains:

     I'm like so
     so not interested
     who to talk with / with whom to talk
     had went / had saw

Semantics / collocates
Pick a relatively uncommon word (most of which will still have tens of thousands of hits on the Web).  Sample the first 70-80 occurrences and see what words collocate most frequently. How does this compare to the BNC?

Dialect
Choose a feature (lexical, grammatical, or whatever) with which you know or suspect that there would be a difference from one country to another (e.g. ac.uk vs .edu, or .es vs .mx vs .ar, or .de vs .at, or .pt vs .br).  Use the "specify domain" feature of Google to compare the frequency in the different countries.