ANNOTATION
Hunston; 79-96
1) What are the advantages of a tagged corpus?
2) What is a probabilistic and a rule-based tagger?
3) What is the role of the lexicon in tagging?
4) What types of words are the hardest to tag?
5) What is a parser? Which is harder -- tagging or parsing? Why?
6) What are the advantages of using a parsed text?
7) In what way is a tagged corpus a "two-edged sword"?
8) Discuss the issue of "ad-hoc" annotation
9) What is the difference between a manual, computer-assisted, and automatic
tagger? What are the benefits and drawbacks of each one?
Use three of the following taggers to analyze five
or six difficult sentences that you have created. Don't focus on the
length of the sentence, but rather focus on sentences with words that are
possibly ambiguous (i.e. light or mean or record as a N,
ADJ, or V).
CLAWS (the tagger
used for the BNC, TIME, BYU American Corpus, etc)
VISL (several
languages; select a language, then "Machine Analysis", then "Flat structure")
Linguist's Search Engine
Compare the output of these three taggers for 5-6 sentences. Which one was the most accurate?
What type of words or sentences or structures could none of the taggers handle?
Bring your notes to class.
BNC headers
(+ My way of doing it -- in relational databases)
Helsinki annotation
EAGLES
TEI (headers)
|