OVERVIEW
 


Hunston 3-23, 213-16

 

  • What gives a corpus value for researchers?

  • Give a few examples of the relationship between frequency and register (text type)

  • Give a few examples of corpus insights into phraseology and collocation

    • cause, fathom, diametrically, naked eye

  • What are some general uses of corpora?  Which interest you the most?

    • teaching

    • translation

    • historical

    • forensic

    • (see corpus.byu.edu; interesting uses)

  • What are some different types of corpora? Which would be the most useful for you?

    • general

    • historical

    • specialized (e.g. genre: spoken, academic)

    • parallel

    • comparable

    • learner

  • Define the following: type, token, hapax, lemma, word-form, tag, parse, annotation

    • the man put the books on the table (8 tokens, 6 types)

    • COCA for lemma, POS (no grouping)

  • Give some examples where corpora provide (even native speakers) with insights that otherwise might not be available.

    • cause

    • naked eye

    • go/come + ADJ

    • help (to)

  • [ACTIVITY] Hunston claims that native speaker intuition usually isn’t very good at guessing frequency and/or collocation.  Try answering the following questions in your head, and only after this compare your intuitions from actual data from the Corpus of Contemporary American English:

o   What is the relative frequency of the following verbs: look, live, like, get, take (input each one separately, and limit to infinitival form of the verb ([VVI]); e.g. like.[vvi] )

o   What is the relative frequency of the following adjectives: important, big, other, only, hard

o   What adjectives occur most frequently with painfully and with completely? (e.g. painfully [j*])

o   What verbs occur most frequently with slowly and with hardly? (e.g. [vvd] slowly) (note: [vvd] = -ED form of a lexical (non-AUX) verb)

o   What verbs occur most frequently in the phrase: hard to V (hard to [vvi]) (note: [vvi] = infinitival form of a lexical verb)

  • What considerations do we need to keep in mind in interpreting corpus data?

    • ± frequent does not = ± possible (mauve carpet)

    • value of corpus and possible data a function of corpus design (e.g. CREA)

    • little non-textual context (oh, sure)

    • gives frequency; we interpret (go/come + ADJ)

  • (p213) What does Hunston mean when she says that corpora can be both authoritarian and empowering?

    • empowering: see what really happens

    • authoritarian: not in the BNC / COCA; not "English" (e.g. Nigeria)

  • What does Hunston mean when she says that corpora have made language analysis more simple, as well as more complex?

    • preposition stranding ([vv*] with): (vs. prescriptive rule) genre, time (after WW II), by verb, other factors?

Note: you didn't do the reading for the following questions, so no need to be prepared before class, but I'll discuss these in class anyway:

  • What is the difference between a rationalist and and an empirical approach to language?

  • What is the difference between competence and performance? Which one did Chomsky favor, and why?

  • Discuss the issue of introspection (vs external data), and how it relates to corpus linguistics (conferences)

  • What was the situation with data processing in the 1950s-1970s, and how did this impact on corpus linguistics?

  • How have advances in data processing aided the resurgence of corpus linguistics? (my Dad)

 

 

 

 

 

 

 

 

 

 

 

QUIZ

1. Concordance: total/totally, interested/interesting, fast/slow, catch/caught

2. Type of corpora: NOT mentioned: parallel, learner, religious, historical

3. Terms: NOT mentioned: regularizing, tagging, parsing, annotation

4. Harder (p213-6): might as well, in terms of, for all [pronoun] know, under the influence