PROJECT: WEB AS CORPUS
Possible topics:
Orthography
Pick a number of words where they are often misspelled, where there is
variation in spelling, or there is correct/incorrect usage (see
1 and
2). How do the differences vary across
different domains (e.g. .edu, .com, .gov, .uk, .ca, etc)? Take a brief sampling
of the types of pages (company websites, individuals' own homepages, etc) that
have the "aberrant" usages/spellings.
Grammar
Choose a "non-standard" grammatical construction and see how it varies
across domains:
I'm like so
so not interested
who to talk with / with whom to talk
had went / had saw
Semantics / collocates
Pick a relatively uncommon word (most of which will still have tens of
thousands of hits on the Web). Sample the first 70-80 occurrences and see
what words collocate most frequently. How does this compare to the BNC?
Dialect
Choose a feature (lexical, grammatical, or whatever) with which you know or
suspect that there would be a difference from one country to another (e.g. ac.uk
vs .edu, or .es vs .mx vs .ar, or .de vs .at, or .pt vs .br). Use the
"specify domain" feature of Google to compare the frequency in the different
countries.
|