Mark Davies
Professor, Corpus Linguistics
Brigham Young University



In late 2005, Routledge published my Frequency Dictionary of Spanish: Core Vocabulary for Learners.  This was based on the 20 million words taken largely from the 1900s portion of my 100 million word Corpus del Español.  It was the first large-scale frequency dictionary of Spanish published in English in more than forty years, and it is the first to be based on:

     1) a large corpus (20 million words) [see details]
     2) a balanced corpus in terms of register (1/3 each of spoken, fiction, and non-fiction)
     3) a balanced corpus in terms of texts from both Spain and Latin America, and
     4) a carefully annotated and lemmatized corpus

The main index of the dictionary shows the common most 5000 lemma in Spanish in decreasing order of frequency.  Each entry in the main index contains: 

rank frequency (1, 2, 3, …), headword, part of speech, English equivalent, sample sentence, range count, raw frequency total, indication of major register variation

As a concrete example, consider the entry for bruja "witch":

4305 bruja nf witch, hag / había una leyenda de una bruja que se montaba en una escoba 61-251 +f –nf

This entry shows that word number 4305 in the rank order list is [bruja], which is a feminine noun [nf] that can be translated as [witch, hag] in English.  We then see an actual sentence or phrase from the Corpus del Español that shows the word in context.  The two following numbers show that the word occurs in 61 of the 100 equally-sized blocks from the corpus (i.e. the range count), and that this lemma occurs 251 times in the corpus.  Finally, the [+f –nf] indicates that the word is much more common in the fiction register than would otherwise be expected, while it is less common in the non-fiction register.

There are also indexes arranged by alphabetical order and "part of speech", which are tied in the the main frequency index.

