SYNTAX I


Prescriptive

Topics: Dr. Grammar

Topics


Lexical or grammatical?

Conrad

  1. Most grammar = +/- acceptable. In CL, "degrees" of acceptability

    1. I think [that] she's nice

    2. has been being V-ed

    3. NP was Ved PREP by NP

    4. IO (Mary, Africa)

    5. require to be V-ed
       

  2. Tie in with functional and cognitive. Not abstract theory, like in generative grammar.

  3. Lexico-grammatical patterns (e.g. [v*] to / [v*] that)

  4. Semantic prosody: e.g. get passive, go/come + ADJ

  5. Relationship to genres: passive, perfect, progressive

  6. Grammar of speech: traditionally not covered (ELang notes)

Biber

  1. Corpus -- 40 million words (see page 8) -- spoken and written, American and British

  2. Real examples

  3. Coverage of language variation (e.g. 1 | 2 pronouns vs. proper Ns; NP modification)

  4. Coverage of preference and frequency (e.g. passives)

  5. Interpretations of frequency: context and discourse

  6. Dialect less important for grammatical purposes than register

  7. Issue of standard vs. non-standard grammar (might could)

  8. Prescriptive vs. descriptive (who/whom, prep stranding, split infinitive)

Hunston: "local grammar" (cf Construction Grammar)

  • the NOUN of the NOUN

  • the NOUN of _vvg

  • VERB that PRON

  • VERB to VERB

  • it BE * NOUN that

One construction vs. many (Biber vs mine)

My work

Spanish/Portuguese (why infinitives):

  1. Causatives (hacer, mandar, dejar, ver, oír)

  2. Clitic climbing (LO quiero / deseo hacerLO)

  3. Subject raising (parecer)

English

  1. to V / V-ing (started to walk / walking) (small corpora page)

  2. V-ing construction (they talked him into going)

  3. +/- for: I'd like (for) you to put them in order

  4. Linguistic Vanguard (byu_ling)

Setting up the searches (precision / recall)

  • passive (all forms vs are|were)

  • who / whom

  • will.[v*] vs will [v*]
     

  • quotative like (and I'm like 'no way')

  • I have not NP vs I don't have NP

  • I think (that) they'll do it

  • I saw the man (that) you talked to

Using the corpora:

  • Finding PoS tags in BYU corpora via no-group

  • Sketch Engine: CQP (basic / tagset (English))

CHAPTER TOPIC
3 Preposition stranding (He's the one I was talking to)
4 Pronouns and gender (he or she)
5 Phrasal verbs: frequency of "separated" verbs: look (up) the word (up)
6 Can / may (can / may I use the phone?)
6 Passive (was studied)
6 Get passive (got run over (vs. was run over))
6 Progressive (is watching)
6 Perfect (HAVE seen)
6 Combinations of perfect, passive, progressive (has been watching, was being considered, etc)
6 Future (going to verb / will verb)
6 Will / shall (I will / shall consider five factors...)
6 Semi-modals (need to , have to, ought to, etc)
6 Modals: frequency of different modals
7 Comparatives (sillier / more silly)
7 Go / come ADJ (go crazy, come clean)
7 Get ADJ(ed)
8 Contraction (they simply cannot / can't do it)
8 No / not negation (I don't have any reason / I have no reason)
9 Frequency of [nn*] [nn*] (the breakfast cereal ad campaign)
10 +/- that (I guess (that) they're not coming)
10 begin / start + INF / V-ING (started watching / to watch)
  Like: and she's like "I'm not going out with him"
  so not ADJ: I'm so not going out with her / he's so not the kind of guy I like

With any feature, there will be some difference between genres, time periods, or dialects. The question is whether this difference is statistically significant. To determine this, you can use chi-square. (I should mention that there are some problems with using chi-square with the types of large numbers that you get with these corpora. But we'll ignore that for the time being.)

Example #1: With +/- "to" in the construction "help someone (to) verb", the following is the data from the BNC and COCA is:

 

American

British

+to

2230

1581

- to

16220

3122

% - to

88%

66%

Plugging the numbers in the four yellow cells into the chi-square calculator, we get a "p-value" of 0 (which is really low, and) which is below .05. So yes, the difference is significant.

Example #2: With "going to VERB" vs. "will VERB" in the five genres of COCA, we get:

 

SPOK

FIC

MAG

NEWS

ACAD

will [v*]

155791

67245

144578

182891

104482

going to [v*]

209335

46999

26512

41795

6113

% going to

57%

41%

14%

19%

6%


Plugging the numbers in the ten yellow cells into the chi-square calculator, we get a "p-value" of 0 (which is again really low, and) which is again below .05. So yes, the differences is again significant.

Example #3: With "accustomed to [vvi] (accustomed to watch)" vs. "accustomed to [vvg] (accustomed to watching") in the different decades of the TIME Corpus, we get:

 

1920s

1930s

1940s

1950s

1960s

1970s

1980s

1990s

2000s

V

36

64

23

10

6

11

5

5

2

V-ing

17

31

38

48

32

44

30

28

8

%V-ing

32%

33%

62%

83%

84%

80%

86%

85%

80%

 

If you plug in the numbers from the yellow cells into the chi-square calculator, we once more get a "p-value" of 0, which is again significant. This makes sense, because there is a big increase in V-ing from the 1930s-1950s. But if we just include the numbers from the 1950s-2000s, then the p-value increases to .98, which is not below .05, and therefore not statistically significant.

Sample research (with partner)

V that: I guess (that) they'll be here at 10

N that: the guy (that) you saw