A note on grades:
As explained in class, some topics are much harder than
others, and it's not fair to someone who spent 45 hours on
a hard topic to get the same grade as someone who spent
1015 minutes getting their data. But . . .just because you
choose a 100 point (possible) topic, it doesn't mean that
your score will be higher than someone who chooses a 94
point topic  at all. The actual points are like
diving:
score = difficulty
x execution
It's very possible
that someone who chooses a 94 point topic will "ace" it (.94
difficulty x .95 execution = 89.3 score). On the other hand,
there will likely be all kinds of issues / problems in the
"execution" of a 100 point topic (1.00 difficulty x .85
execution = 85.0 score). Your choice, but you've been warned
:) . . . .
The electronic versions of the paper will be due to
me (mark_davies@byu.edu) by
two weeks after we finish the chapter in which the topic is
found. For example, if we finish Chapter 4 on Jan 26, then
any topics from Chapter 4 would be due no later than 11:59
PM on Feb 6. The subject line should be
325 paper  no more, no less. The email will have
an attachment with a Word document, where the format of the
filename is your last name + first name, e.g.
jones_fred.docx. There will be a 10% penalty for papers that
are turned late but within the first 24 after it's due, and then 10%
off for each additional day.
Note that you can
receive an extra two points (e.g. a 91 becomes a 93) if
you turn it in within three days of the end of the chapter.
To do the project, you can use the
following, or any other corpus that you'd like (clear it with me,
though, before you use corpora that aren't from this list):
In the paper, you should consider:

# Pages 
Question 
Corpora 
1 
.25.5 
What do Biber (or others) have to say
about the topic, based on their corpus? 

2 
.751.00 
Any differences
between the five genres in American English? 
COCA 
3 
.751.00 
Any difference
between British and American English? 
COCA /
BNC 
4 
.751.00 
How have things
changed over the last 100200 years? 
COHA / TIME 
(5) 
(.751.00) 
Is the variation a function of
particular lexical items? 
( COCA
) 
Please format the page as follows:

1.5 line spacing

1" margins

12 pt font

PLEASE
indicate EXACTLY what search strings you used

Don't be afraid to use charts / tables
("a picture speaks a thousand words"). If you do use them,
though, please briefly explain what they mean.

Citations are not a big deal. This is
not a library paper, but rather a corpusbased paper, based
primarily on data that you have collected.
Using Excel / ratios
If you are comparing two constructions (e.g. will/shall, or have
proven/proved), then the chart should be the RATIO of the two
constructions. Please don't
just give the frequency of A and then B, and expect me to
create a chart in my mind showing the ratio. To calculate ratios, do the following


Column A 

Row 1 
Feature A 
30 
< Cell A1 
Row 2 
Feature B 
70 
< Cell A2 
Row 3 
% feature B 
=(A2)/(A1+A2) 

After creating the ratio formula in one cell, just copy and
paste it to the other cells in that row. Then highlight the
cells in that "ratio" row and choose "Insert" and then "Column"
or "Line" (whichever kind of chart you want) to create the
chart.
(Optional)
With any feature, there will be some
difference between genres, time periods, or dialects. The question
is whether this difference is statistically significant. To
determine this, you can use
the
chisquare
test. (I should mention that there are
some problems with using chisquare with the types of large
numbers that you get with these corpora. But we'll ignore that for
the time being.)
Example #1: With +/ "to" in the
construction "help someone (to) verb", the following is the data
from the BNC and COCA is:

American 
British 
+ to 
2230 
1581 
 to 
16220 
3122 
%  to 
88% 
66% 
Plugging the numbers in the four yellow
cells into the
chisquare calculator, we get a "pvalue" of 0 (which
is really low, and) which is below .05. So yes, the difference
is significant.
Example #2: With "going to VERB" vs. "will
VERB" in the five genres of COCA, we get:

SPOK 
FIC 
MAG 
NEWS 
ACAD 
will [v*] 
155791 
67245 
144578 
182891 
104482 
going to [v*] 
209335 
46999 
26512 
41795 
6113 
% may I 
57% 
41% 
14% 
19% 
6% 
Plugging the numbers in the ten yellow cells into the
chisquare calculator, we get a "pvalue" of 0
(which is again really low, and) which is again below .05.
So yes, the differences is again significant.
Example #3: With "accustomed to [vvi]
(accustomed to watch)" vs. "accustomed to [vvg] (accustomed to
watching") in the different decades of the TIME Corpus, we get:

1920s 
1930s 
1940s 
1950s 
1960s 
1970s 
1980s 
1990s 
2000s 
V 
36 
64 
23 
10 
6 
11 
5 
5 
2 
Ving 
17 
31 
38 
48 
32 
44 
30 
28 
8 
%Ving 
32% 
33% 
62% 
83% 
84% 
80% 
86% 
85% 
80% 
If you plug in the numbers from the
yellow cells into the
chisquare calculator, we once more get a "pvalue"
of 0, which is again significant. This makes sense, because
there is a big increase in Ving from the 1930s1950s. But
if we just include the numbers from the 1950s2000s, then
the pvalue increases to .98, which is not below .05, and
therefore not statistically significant.

