| Attributes | | lemma | provides a lemma (base form) for the word, typically uninflected and serving both as an identifier (e.g. in dictionary contexts, as a headword), and as a basis for potential inflections.| Status | Optional | | Datatype | teidata.text | <w lemma="wife">wives</w>
| <w lemma="Arznei">Artzeneyen</w>
|
| | lemmaRef | provides a pointer to a definition of the lemma for the word, for example in an online lexicon.| Status | Optional | | Datatype | teidata.pointer | <w type="verb"
lemma="hit"
lemmaRef="http://www.example.com/lexicon/hitvb.xml">hitt<m type="suffix">ing</m>
</w>
|
| | pos | (part of speech) indicates the part of speech assigned to a token (i.e. information on whether it is a noun, adjective, or verb), usually according to some official reference vocabulary (e.g. for German: STTS, for English: CLAWS, for Polish: NKJP, etc.).| Status | Optional | | Datatype | teidata.text | The German sentence ‘Wir fahren in den Urlaub.’ tagged with the Stuttgart-Tuebingen-Tagset (STTS).<s>
<w pos="PPER">Wir</w>
<w pos="VVFIN">fahren</w>
<w pos="APPR">in</w>
<w pos="ART">den</w>
<w pos="NN">Urlaub</w>
<w pos="$.">.</w>
</s>
| The English sentence ‘We're going to Brazil.’ tagged with the CLAWS-5 tagset, arranged inline (with significant whitespace).<p><w pos="PNP">We</w><w pos="VBB">'re</w> <w pos="VVG">going</w> <w pos="PRP">to</w> <w pos="NP0">Brazil</w><pc pos="PUN">.</pc></p>
| The English sentence ‘We're going on vacation to Brazil for a month!’ tagged with the CLAWS-7 tagset and arranged sequentially.<p>
<w pos="PPIS2">We</w>
<w pos="VBR">'re</w>
<w pos="VVG">going</w>
<w pos="II">on</w>
<w pos="NN1">vacation</w>
<w pos="II">to</w>
<w pos="NP1">Brazil</w>
<w pos="IF">for</w>
<w pos="AT1">a</w>
<w pos="NNT1">month</w>
<pc pos="!">!</pc>
</p>
|
| | msd | (morphosyntactic description) supplies morphosyntactic information for a token, usually according to some official reference vocabulary (e.g. for German: STTS-large tagset; for a feature description system designed as (pragmatically) universal, see Universal Features).| Status | Optional | | Datatype | teidata.text | <ab>
<w pos="PPER"
msd="1.Pl.*.Nom">Wir</w>
<w pos="VVFIN"
msd="1.Pl.Pres.Ind">fahren</w>
<w pos="APPR"
msd="--">in</w>
<w pos="ART"
msd="Def.Masc.Akk.Sg">den</w>
<w pos="NN"
msd="Masc.Akk.Sg">Urlaub</w>
<pc pos="$."
msd="--">.</pc>
</ab>
|
| | join | when present, provides information on whether the token in question is adjacent to another, and if so, on which side.| Status | Optional | | Datatype | teidata.text | | Legal values are: | - no
- the token is not adjacent to another
- left
- there is no whitespace on the left side of the token
- right
- there is no whitespace on the right side of the token
- both
- there is no whitespace on either side of the token
- overlap
- the token overlaps with another; other devices (specifying the extent and the area of overlap) are needed to more precisely locate this token in the character stream
| The example below assumes that the lack of whitespace is marked redundantly, by using the appropriate values of join.<s>
<pc join="right">"</pc>
<w join="left">Friends</w>
<w>will</w>
<w>be</w>
<w join="right">friends</w>
<pc join="both">.</pc>
<pc join="left">"</pc>
</s> Note that a project may make a decision to only indicate lack of whitespace in one direction, or do that non-redundantly. The existing proposal is the broadest possible, on the assumption that we adopt the "streamable view", where all the information on the current element needs to be represented locally. | The English sentence ‘We're going on vacation.’ tagged with the CLAWS-5 tagset, arranged sequentially, tagged on the assumption that only the lack of the preceding whitespace is indicated.<p>
<w pos="PNP">We</w>
<w pos="VBB"
join="left">'re</w>
<w pos="VVG">going</w>
<w pos="PRP">on</w>
<w pos="NN1">vacation</w>
<pc pos="PUN"
join="left">.</pc>
</p>
| | Note | The definition of this attribute is adapted from ISO MAF (Morpho-syntactic Annotation Framework), ISO 24611:2012. |
|
|