Sign up

A note on grades: As explained in class, some topics are much harder than others, and it's not fair to someone who spent 4-5 hours on a hard topic to get the same grade as someone who spent 10-15 minutes getting their data. But . . .just because you choose a 100 point (possible) topic, it doesn't mean that your score will be higher than someone who chooses a 94 point topic - at all. The actual points are like diving:

score = difficulty  x  execution

It's very possible that someone who chooses a 94 point topic will "ace" it (.94 difficulty x .95 execution = 89.3 score). On the other hand, there will likely be all kinds of issues / problems in the "execution" of a 100 point topic (1.00 difficulty x .85 execution = 85.0 score). Your choice, but you've been warned :-) . . . .


The electronic versions of the paper will be due to me (mark_davies@byu.edu) by two weeks after we finish the chapter in which the topic is found. For example, if we finish Chapter 4 on Jan 26, then any topics from Chapter 4 would be due no later than 11:59 PM on Feb 6. The subject line should be 325 paper -- no more, no less. The email will have an attachment with a Word document, where the format of the filename is your last name + first name, e.g. jones_fred.docx. There will be a 10% penalty for papers that are turned late but within the first 24 after it's due, and then 10% off for each additional day.

Note that you can receive an extra two points (e.g. a 91 becomes a 93) if you turn it in within three days of the end of the chapter.


 

To do the project, you can use the following, or any other corpus that you'd like (clear it with me, though, before you use corpora that aren't from this list):

In the paper, you should consider:

 

# Pages

Question

Corpora

1

.25-.5

What do Biber (or others) have to say about the topic, based on their corpus?

 

2

.75-1.00

Any differences between the five genres in American English?

COCA

3

.75-1.00

Any difference between British and American English?

COCA / BNC

4

.75-1.00

How have things changed over the last 100-200 years?

COHA / TIME

(5)

(.75-1.00)

Is the variation a function of particular lexical items?

( COCA )

Please format the page as follows:

  • 1.5 line spacing

  • 1" margins

  • 12 pt font
     

  • PLEASE indicate EXACTLY what search strings you used

  • Don't be afraid to use charts / tables ("a picture speaks a thousand words"). If you do use them, though, please briefly explain what they mean.

  • Citations are not a big deal. This is not a library paper, but rather a corpus-based paper, based primarily on data that you have collected.


Using Excel / ratios

If you are comparing two constructions (e.g. will/shall, or have proven/proved), then the chart should be the RATIO of the two constructions. Please don't just give the frequency of A and then B, and expect me to create a chart in my mind showing the ratio. To calculate ratios, do the following

   

Column A

 

Row 1

Feature A

30

<- Cell A1

Row 2

Feature B

70

<- Cell A2

Row 3

% feature B

=(A2)/(A1+A2)

 

After creating the ratio formula in one cell, just copy and paste it to the other cells in that row. Then highlight the cells in that "ratio" row and choose "Insert" and then "Column" or "Line" (whichever kind of chart you want) to create the chart.
 


(Optional)

With any feature, there will be some difference between genres, time periods, or dialects. The question is whether this difference is statistically significant. To determine this, you can use the chi-square test. (I should mention that there are some problems with using chi-square with the types of large numbers that you get with these corpora. But we'll ignore that for the time being.)

Example #1: With +/- "to" in the construction "help someone (to) verb", the following is the data from the BNC and COCA is:

 

American

British

+ to

2230

1581

- to

16220

3122

% - to

88%

66%

Plugging the numbers in the four yellow cells into the chi-square calculator, we get a "p-value" of 0 (which is really low, and) which is below .05. So yes, the difference is significant.

Example #2: With "going to VERB" vs. "will VERB" in the five genres of COCA, we get:

 

SPOK

FIC

MAG

NEWS

ACAD

will [v*]

155791

67245

144578

182891

104482

going to [v*]

209335

46999

26512

41795

6113

% may I

57%

41%

14%

19%

6%


Plugging the numbers in the ten yellow cells into the chi-square calculator, we get a "p-value" of 0 (which is again really low, and) which is again below .05. So yes, the differences is again significant.

Example #3: With "accustomed to [vvi] (accustomed to watch)" vs. "accustomed to [vvg] (accustomed to watching") in the different decades of the TIME Corpus, we get:

 

1920s

1930s

1940s

1950s

1960s

1970s

1980s

1990s

2000s

V

36

64

23

10

6

11

5

5

2

V-ing

17

31

38

48

32

44

30

28

8

%V-ing

32%

33%

62%

83%

84%

80%

86%

85%

80%

 

If you plug in the numbers from the yellow cells into the chi-square calculator, we once more get a "p-value" of 0, which is again significant. This makes sense, because there is a big increase in V-ing from the 1930s-1950s. But if we just include the numbers from the 1950s-2000s, then the p-value increases to .98, which is not below .05, and therefore not statistically significant.