HISTORICAL I

Digital Historical Corpora- Architecture, Annotation, and Retrieval (Dagstuhl, Germany, 2006)

Historical Text Mining workshop (Lancaster, England, 2006)


ONE OF THE FOLLOWING:

Davies, Mark (forthcoming) “Advanced research on syntactic and semantic change with the 100 million word, fully-annotated Corpus del Español." In Romance language corpora and historical linguistics, ed. Claus Pusch. Gunter Naar

Kytö, Merya and Matti Rissanen (1993) “General introduction”. In Early English in the computer age, ed. Matti Rissanen, et al. Mouton de Gruyter. 1-17.


Davies, Mark (forthcoming) “Advanced research on syntactic and semantic change with the 100 million word, fully-annotated Corpus del Español." In Romance language corpora and historical linguistics, ed. Claus Pusch. Gunter Naar

List (and give one simple example) of five types of searches that can be done with the Corpus del Español that cannot be done with any other corpus of historical Spanish, or ANY other historical corpus -- for that matter.

Kytö, Merya and Matti Rissanen (1993) “General introduction”. In Early English in the computer age, ed. Matti Rissanen, et al. Mouton de Gruyter. 1-17.

  1. How big is the corpus -- overall, and in the different historical periods?
  2. What about speling variashun (:-) and tagging?
  3. Describe briefly the coding of the texts

Discuss briefly the issues and standards used with regards to each of the following criteria:

  1. Chronological coverage
  2. Regional coverage
  3. Sociolinguistic coverage
  4. Generic coverage

  • English corpora (Davies list) / Helsinki Corpus
  • Spanish/Portuguese (Mallorca)
  • Problems with text archives (Michigan 2006)
  • OED (Dagstuhl 2006)
  • TIME (ICAME 2007)

Using English historical corpora

1. OED / BECOME: Lexical: Find a word that appears for the first time in the 1700s. Then check LION and EEBO to see if the corpus data supports the OED claims.

2. New York Times: Lexical: Compare exceedingly, very, extremely, highly, totally in American English -- 1850-1900, 1900-1950, and 1950-2000.

3. BECOME: Grammatical: Compare "knows/knoweth NOT" vs "DOESN'T / DOES NOT know" (1500s-1900s).


BECOME:

1400s 3,273,000
1500s 67,014,000
1600s 371,434,000
1700s 49,165,000
1800s 193,017,000
1900s 23,532,000