ANNOTATION
 

Hunston; 79-96

1) What are the advantages of a tagged corpus?
2) What is a probabilistic and a rule-based tagger?
3) What is the role of the lexicon in tagging?
4) What types of words are the hardest to tag?
5) What is a parser?  Which is harder -- tagging or parsing? Why?
6) What are the advantages of using a parsed text?
7) In what way is a tagged corpus a "two-edged sword"?
8) Discuss the issue of "ad-hoc" annotation
9) What is the difference between a manual, computer-assisted, and automatic tagger?  What are the benefits and drawbacks of each one?


Use three of the following taggers to analyze five or six difficult sentences that you have created.  Don't focus on the length of the sentence, but rather focus on sentences with words that are possibly ambiguous (i.e. light or mean or record as a N, ADJ, or V).

CLAWS (the tagger used for the BNC, TIME, BYU American Corpus, etc)
VISL (several languages; select a language, then "Machine Analysis", then "Flat structure")
Linguist's Search Engine

Compare the output of these three taggers for 5-6 sentences.  Which one was the most accurate?  What type of words or sentences or structures could none of the taggers handle?  Bring your notes to class.


BNC headers
(+ My way of doing it -- in relational databases)
Helsinki annotation

EAGLES
TEI (headers)