\n\n
DEV SITE - NOT FOR INDEXING

TEI Lex-0

— A baseline encoding for lexicographic data

8. Usage

Usage labels is a procedure which indicates that “a certain lexical item deviates in a certain respect from the main bulk of items described in a dictionary and that its use is subject to some kind of restriction”

In the current TEI guidelines, <usg> is defined as an element which marks up “usage information in a dictionary entry”. Prototypically, usage information is a label which can be attached at various points in the entry hierarchy in order to signal restrictions in terms of geographic regions, domains of specialized language or stylistic properties for the particular lexical item that it is attached to.

8.1. Label-like vs. narrative usage descriptions

Usage information ca be provided in dictionaries both in the form of label-like descriptors (often abbreviated) and as fuller narrative expressions.

Consider, for instance, the following senses taken from a German entry for Pflaume “plum” where usage information is provided by labels taken from fixed sets of values for stylistic and diatopic properties:

    <entry xml:id="pflaumexml:lang="de"
     type="mainEntry">
     <form type="lemma">
      <orth>Pflaume</orth>
     </form>
     <sense n="1xml:id="pflaume.1">
      <def xml:lang="de">Frucht des Pflaumenbaums</def>
      <def xml:lang="en">fruit of the plum tree</def>
     </sense>
     <sense n="2xml:id="pflaume.2">
      <usg type="socioCultural"
       norm="colloquial">ugs.</usg>
      <def xml:lang="de">Pflaumenbaum</def>
      <def xml:lang="en">plum tree</def>
     </sense>
     <sense n="3xml:id="pflaume.3">
      <usg type="socioCulturalnorm="casual">salopp</usg>
      <usg type="socioCultural"
       norm="expletive">Schimpfwort</usg>
      <def xml:lang="de">ungeschickter, untauglicher Mensch</def>
      <def xml:lang="en">awkward, ineligible person</def>
     </sense>
     <sense n="4xml:id="pflaume.4">
      <usg type="geographicnorm="regional">landsch.</usg>
      <usg type="socioCulturalnorm="casual">salopp</usg>
      <def xml:lang="de">anzügliche, leicht boshafte Bemerkung</def>
      <def xml:lang="en">offensive, slightly mischievous remark</def>
     </sense>
    </entry>

In contrast to the example above, the following sample features an occurrence of a more verbose usage description that does not rely on a fixed vocabulary. The sample is taken from a Serbian dialect dictionary. The quote in the dialect is further qualified by a usage hint: “(said by a peasant woman in the field in hot weather)” which provides a particular context in which the quote was recorded.

    <cit type="example">
     <quote>„Ду́ни, ве́тре, се́јче леб да пе́че”</quote>
     <usg type="hint">(рекла сељанка на њиви за време врућине)</usg>
     <bibl>(<placeName>Дубница</placeName>).</bibl>
    </cit>Златановић (2017) 

8.2. Types of usage

In TEI Lex-0, <usg> is a typed element and type is a mandatory attribute. The default value is: <usg type="hint"></usg>. The default attribute value should be used when it is not possible to otherwise classify the usage label. The type of a <usg> should be thought of as a conceptual axis (independent from other types) along which the given value of the element is located.

The following list of label types and their definitions is adapted from Salgado et al. 2019b:

  • temporal label: marker which identifies the use of a given lexical unit on a scale from old to new. Syn: diachronic marking; diachronic information; time label.
    <usg type="temporal"/>
  • geographic label: marker which identifies the place or region where a lexical unit is mainly used. Some dictionaries do not identify a specific place but identify that the word is not used generally in every geographic area (e.g., regionalismo in Portuguese, or покр. (abbrev. for покрајински) in Serbian). Syn: diatopic marking; diatopic information; region label.
    <usg type="geographic"/>
  • domain label: marker which identifies the specialized field of knowledge in which a lexical unit is mainly used. Syn: diatechnical marking; domain label; field label; subject field label; topic label.
    <usg type="domain"/>
  • frequency label: marker which identifies the relative rate of occurrence of a lexical unit in a given textual context. Syn: diafrequential marking; diafrequential information
      <usg type="frequency"/>
  • textType label: marker which identifies the typical use of a lexical unit in a particular discourse type or genre Syn: diatextual information.
    <usg type="textType"/>
  • attitude label: marker which identifies the speaker’s subjective point of view, positive or negative, regarding the object referred to by a given lexical unit. Syn: diaevaluative marking; diaevaluative information.
    <usg type="attitude"/>
  • socioCultural label: marker which identifies the use of a given lexical unit by particular social groups and/or in certain types of communicative situations depending on their level of formality Syn: diaphasic marking; diaphasic information.
    <usg type="socioCultural"/>
  • meaningType label: marker which identifies a semantic extension of the sense of a given lexical unit.
    <usg type="meaningType"/>
  • normativity label: marker which identifies the use of a given lexical unit which is in some aspect considered to be non-standard or incorrect.
    <usg type="normativity"/>

The TEI Guidelines offer a range of sample values for types to illustrate potential uses of <usg>, but not al of them have been carried over to TEI Lex-0. The following table shows the differences between suggested values of type in TEI and the required values of type in TEI Lex-0:

TEI P5 (suggested types)TEI Lex-0 (required types)Еxample values
timetemporalarchaic, old
geogeographicAmE., dial.
domdomainMed., Biol., Phys.
plevfrequencyrare, occas.
-textTypebibl., poet., admin., journalese
-attitudederog., euph.
regsocioCulturalslang, vulgar, formal
stylemeaningTypefig. (=figurative), lit. (= literal)
-normativitynon-standard, incorrect
lang-
gram-
syn-
hyper-
colloc-
comp-
obj-
subj-
verb-
hinthint

In TEI-Lex-0:

  1. The type attribute is made mandatory.
  2. The element <usg> is used in a narrower sense than is currently the case in the TEI Guidelines.
  3. The norm attribute is encouraged.

Justification:

  1. Without type attribute, <usg> would be an underspecified element. Usage labels describe a wide range of linguistic phenomena. Classifying them should be considered a good practice.
  2. Currently, the TEI Guidelines contain an overuse of <usg> for describing phenomena that could be covered by alternative, more narrowly defined TEI elements. It should be considered a good practice to use the most specific TEI element available. See table above and the next section Restricting the scope of <usg>
  3. It is good practice to normalize the values of the <usg> elements because dictionaries are not always consistent in the way they use their usage labels. For instance, abbreviated and unabbreviated labels can appear in the same dictionary: they should be normalized to a single value. Normalization should be only restricted to a single dictionary. A global normalization effort is currently beyond the scope of TEI Lex-0.

8.3. Restricting the scope of usg

  1. Do not use <usg type="lang"> to mark up the name of a language in an etymological or other discussion. The recommended way to encode this information is using <lang> element within <etym>.

    INCORRECT

      <entryFree xml:id="MZ.RGJS.сајдисльк_1">
       <form type="lemma">
        <orth>сајдисль́к</orth>
       </form>
       <gramGrp>
        <gram type="pos">м</gram>
       </gramGrp>
       <usg type="lang">тур.</usg>
       <sense>
        <def>уважавање.</def></sense>
      </entryFree>

    CORRECT

      <entry xml:id="MZ.RGJS.сајдисльк_2"
       xml:lang="sr">
       <form type="lemma">
        <orth>сајдисль́к</orth>
       </form>
       <gramGrp>
        <gram type="pos">м</gram>
       </gramGrp>
       <etym>
        <lang value="trexpand="турцизам"
         norm="tr">*</lang>
       </etym>
       <!--...-->
       <sense xml:id="MZ.RGJS.сајдисльк_2.1">
        <def>уважавање.</def>
        <!--...-->
       </sense>
      </entry>
  2. Do not use <usg type="hyper"></usg> or <usg type="syn"/> to mark lexical relations such as hypernymy or synonymy. The recommended way to encode lexical relations in TEI Lex-0 the reference mechanism provided by <xr>. See the secion on the typology of cross-references..
  3. Do not use <usg type="colloc"></usg> or for that matter "comp.", "obj.", "subj.", "verb" etc., to encode collocations or rection information. See TODO.
  4. <usg type="hint"></usg> should be used as fallback for cases where the usage information does not fall into one of the recognized cases discussed above; or as an intermediate solution during the process of encoding the dictionary automatically.
  5. Frequency information on lexicographic entities may differ from other types of usage information in that it often cannot be interpreted without further context. In phrases such as “mostly biology” or “rarely used in American English” it serves the purpose of a modifier (quantifier) to another usage information (or other lexical information). Such use calls for modeling the frequency information as an attribute to the usg element modified. For frequency information provided explicitly (e.g. corpus frequencies), a separate element should be introduced. TODO

8.4. Hierarchical usage labels

Usage labels tend to be described in dictionaries as flat lists: the list of all labels usually appears in the front matter, and often as part of lists of abbreviations, which may include different types of content, i.e. not only usage labels but also other types of abbreviations (grammatical, etymological etc.) This is less than ideal from a data-modeling point of view, especially when more generic usage labels (such as sport) appear together with more specific types of labels (such as football, basketball or volleyball).

To overcome the deficiency of flat representation of labels in general-language dictionaries, TEI Lex-0 recommends that canonical, possibly multilingual, labels be defined, when needed, in the <encodingDesc> section of the <teiHeader>, and then pointed to from the individual entries or senses in which these labels are used. This is possible in both TEI P5 and TEI Lex-0 but has not been documented until now as a solution for representing usage labels.

A <taxonomy> is encoded within a <classDecl> using <category> and <catDesc> elements. TEI Lex-0 is stricter than TEI P5 because it requires the use of <term> within <catDesc>. The definition of a given <term> can be optionally provided as a <gloss>.

The following example shows the recommended way of encoding two super domains earth science and sport, together with some of their subdomains:

    <encodingDesc>
     <classDecl>
      <taxonomy xml:id="domain">
       <category xml:id="domain.earth_sciences">
        <catDesc xml:lang="en">
         <term>Earth Sciences</term>
         <gloss>
          <!--Definition of the term would go here.-->
         </gloss>
        </catDesc>
        <catDesc xml:lang="pt">
         <term>Ciências da Terra</term>
        </catDesc>
        <catDesc xml:lang="es">
         <term>Ciencias de la Tierra</term>
        </catDesc>
        <catDesc xml:lang="fr">
         <term>sciences de la Terre</term>
        </catDesc>
        <category xml:id="domain.earth_sciences.geology">
         <catDesc xml:lang="en">
          <term>Geology</term>
         </catDesc>
         <catDesc xml:lang="pt">
          <term>Geologia</term>
         </catDesc>
         <catDesc xml:lang="es">
          <term>Geología</term>
         </catDesc>
         <catDesc xml:lang="fr">
          <term>Geologie</term>
         </catDesc>
         <category xml:id="domain.earth_sciences.geology.mineralogy">
          <catDesc xml:lang="en">
           <term>Mineralogy</term>
          </catDesc>
          <catDesc xml:lang="pt">
           <term>Mineralogia</term>
          </catDesc>
          <catDesc xml:lang="es">
           <term>Mineralogía</term>
          </catDesc>
          <catDesc xml:lang="fr">
           <term>Mineralogie</term>
          </catDesc>
         </category>
        </category>
       </category>
       <category xml:id="domain.sports">
        <catDesc xml:lang="en">
         <term>Sport</term>
        </catDesc>
        <catDesc xml:lang="pt">
         <term>Desporto</term>
        </catDesc>
        <catDesc xml:lang="es">
         <term>Deporte</term>
        </catDesc>
        <catDesc xml:lang="fr">
         <term>Sport</term>
        </catDesc>
        <category xml:id="domain.sports.football">
         <catDesc xml:lang="en">
          <term>Football</term>
         </catDesc>
         <catDesc xml:lang="pt">
          <term>Futebol</term>
         </catDesc>
         <catDesc xml:lang="es">
          <term>Fútebol</term>
         </catDesc>
         <catDesc xml:lang="fr">
          <term>Football</term>
         </catDesc>
        </category>
       </category>
      </taxonomy>
     </classDecl>
    </encodingDesc>

To apply a domain label in an entry, use the <usg> element with a valueDatcat attribute pointing to the xml:id of the appropriate category in the taxonomy.

    <entry type="mainEntryxml:lang="pt"
     xml:id="DLPC.cristalografia">
     <form type="lemma">
      <orth>cristalografia</orth>
      <pron>kriʃtɐluɡrɐˈfiɐ</pron>
     </form>
     <gramGrp>
      <gram type="posnorm="NOUN">n.</gram>
      <gram type="gen">f.</gram>
     </gramGrp>
     <sense xml:id="DLPC.cristalografia_1">
      <usg type="domain"
       valueDatcat="#domain.earth_sciences.geology.mineralogy">Mineralogia</usg>
      <def>ciência que estuda e descreve a forma e a estrutura dos cristais, bem como as leis que regem a sua formação</def>
     </sense>
     <!--etc.-->
    </entry>