|
|
EDUCATION
|
|
I received a B.A. from Brigham Young
University
in 1986 with a double major in Linguistics
and Spanish, which was
followed by an M.A. in Spanish Linguistics from BYU in
1989. I then received a PhD from the University of
Texas at Austin in 1992, with a specialization in
"Ibero-Romance
Linguistics".
|
PUBLICATIONS AND PRESENTATIONS
[SEE VITA] |
|
For the first ten years
or so of my career, most of my publications dealt primarily with historical and genre-based variation in
Spanish and Portuguese syntax. Since that time, however, they have
increasingly dealt with general issues in corpus design, creation, and
use, especially with regards to English. Overall, I have had nearly
forty articles published in these areas, as well as numerous
presentations at international conferences. [SEE VITA]
|
|
TEACHING
|
|
In Summer 2008 I'm teaching English Grammar
(ELang 325).
Other recent classes include
Empirical
Methods in English Linguistics and
Corpus Linguistics.
|
|
|
CORPUS OF
CONTEMPORARY AMERICAN
ENGLISH (COCA) (2008)
|
|
In 2008 I placed online a 360+
million word corpus of American English. This is the only large-scale
corpus of American English, and it is in fact the largest (and hopefully
most useful) structured corpus of any language freely available on the web. The corpus
contains twenty
million words in each year from 1990 to the present, with four
million words each year in the five genres of spoken, fiction, popular
magazines, newspapers, and academic. Best of
all, the corpus will be continually updated -- 20 million words each
year -- from this point on. |
|
FREQUENCY
DICTIONARY OF PORTUGUESE (2007) |
|
I created this dictionary in conjunction with
Prof. Ana Preto-Bay from the Department of Spanish and Portuguese at BYU.
The dictionary is based on the 20 million words from the 1900s portion of
the 45 million word Corpus do
Português. It is the first frequency dictionary of Portuguese that is
based on a large corpus from several different genres, and it has a format quite similar to the
Frequency Dictionary of Spanish, discussed below. It was published by
Routledge in late 2007. |
|
FREQUENCY
DICTIONARY OF SPANISH (2005) |
|
This frequency dictionary of Spanish was
published in late 2005, and was the first major frequency dictionary of Spanish
published in English since 1964. It was based on more than 20 million words from many different
registers, and includes many features not found in any previous dictionary of
Spanish. [MORE INFORMATION] |
|
CORPUS DO PORTUGUÊS (2004-06)
|
|
In April 2004 I
was awarded a
two year grant from the
National Endowment for the Humanities to
create a corpus of historical Portuguese,
in conjunction with Prof. Michael Ferreira of Georgetown University.
This corpus allows users to compare the frequency, distribution, and use
of words, phrases, and grammatical constructions between different
historical periods, registers, and dialects of Portuguese. |
|
REGISTER VARIATION IN SPANISH (2002-04)
|
|
In July 2002 I
was awarded a two year grant from the National Science Foundation to
research the "Multi-dimensional analysis of register variation
in
Spanish". As Co-PI with Prof. Douglas Biber of Northern Arizona
University, we used large corpora of many different registers of
Spanish from the 1600s-1900s to explore the
syntactic variation between
these registers.
|
|
CORPUS DEL ESPAÑOL (2001-02, MAJOR
NEW RELEASE 2007)
|
|
In April 2001 I
was awarded a 16 month grant from the National Endowment for
the
Humanities to develop a 100 million word
searchable
corpus of historical and modern Spanish texts on the
web. Unlike
other large corpora of Spanish, my Corpus del Español allows
users to perform advanced
searches based on part of speech, lemma, synonyms, and word
and clause frequency. |
|
OTHER
PROJECTS |
|
In the past year, I've also
created a 400+ million word corpus of transcripts of spoken American English
(2000-present), as well as a 100 million word corpus from an American
magazine, 1920s-present. For reasons of copyright, however, these are not
currently available to others. If you're
interested in
multilingual corpora, you might try a few that I've created: the Polyglot
Bible (Gospel of Luke in 30 languages) and the
Latin-Old Spanish-Modern Spanish Bible (entire text).
|
|
|
TECHNOLOGIES
|
|
In order to create large corpora and place
them online, I have acquired experience in a number of
different technologies.
These include database organization and optimization (mainly
with SQL
Server, including advanced SQL
queries), web-database integration (ActiveX
Data Objects), server-side scripting (mainly
Active
Server Pages, via
VBScript),
client-side programming (mainly
DHTML
/ Javascript), basic
file and
text manipulation (regular
expressions, batch
files, etc), and several different corpus and
text-related tools
(like WordSmith
and TextPad). I
also maintain
the hardware and software for my
three
Windows
2003 Servers, including the administration of
Internet
Information Services.
|
|
INTERESTS
|
|
Beyond life at the university,
my
interests include comparative
religion,
world
cultures, world
history (especially ancient and medieval),
languages of the world,
and the implications of
technology,
including the
Internet.
And of course I enjoy spending time with my family -- my wife
Kathy, and
"my three sons" -- Spencer, Joseph, and
Adam. |
|
EMAIL
|
|
 |
American National Corpus |
|