Victoria University

Māori Vocabulary: A Study of Some High Frequency Homonyms

ResearchArchive/Manakin Repository

Show simple item record

dc.contributor.advisor Bauer, Winifred Keane-Tuala, Kelly Elizabeth 2013-10-24T23:27:34Z 2013-10-24T23:27:34Z 2013 2013
dc.description.abstract The problem addressed in this thesis concerns the accuracy of Māori language vocabulary counts, e.g Boyce (2006), where Māori was found to use a very small vocabulary in comparison with e.g. English. As Boyce (2006, ii) acknowledges, this is partly explained by the degree of homonymy in Māori, which undermines the accuracy of the count. Homonymy is the phenomenon of the same string of letters (word-form) having two or more unrelated meanings (e.g. kī ‘say’, ‘be full’). Automated word-form counts of Maori language texts count the form kī as the same word, regardless of its meaning. Unless different meanings of the same word-form are counted as different words, such counts will underestimate the vocabulary of the Māori language. (Homonymy is not the only explanation for the low count; further explanations have been suggested by Bauer (2009) and Nation (2011).) The thesis explores whether there are consistent clues in the linguistic environment that signal the correct interpretation of homonyms in texts, and if so, how such clues could be used for tagging corpora so that counting would be more accurate. The Boyce corpus of modern broadcast Māori (Boyce, 2006, ii) provided the data. Case studies were made of three high-frequency homonyms in this corpus, kī ‘say’, ‘full’, mea ‘say’, ‘thing’ and tau ‘settle’, ‘year’. Lyons' (1968) criterion of distinction was applied to establish the lexemes realised by each of these word-forms on the basis of dictionary and etymological information. The tokens of each word-form were then extracted from Boyce’s (2006) corpus using the concordance program ‘WordSmith Tools’. WordSmith Tools is a computer program that helps to look at how words behave in a text. Concord which is part of WordSmith Tools enables the user to see any word or phrase in context. Phrase peripheries (the words before and after each word-form in the same phrase) were analysed and the wider syntactic environment was also examined in order to find clues which signalled the appropriate lexeme for each token. The results showed that the lexemes from all three case studies could be identified in the corpus on the basis of consistent clues that occur in its linguistic environment. If the phrasal periphery of the word-form is examined, and the grammatical information supplied by the wider linguistic environment is taken into account, it is possible to determine the appropriate lexemic tag for a word-form in a corpus in Māori. en_NZ
dc.language.iso en_NZ
dc.publisher Victoria University of Wellington en_NZ
dc.subject Maori en_NZ
dc.subject Tagging en_NZ
dc.subject Homonym en_NZ
dc.title Māori Vocabulary: A Study of Some High Frequency Homonyms en_NZ
dc.type Text en_NZ
vuwschema.contributor.unit School of Maori Studies : Te Kawa a Māui en_NZ
vuwschema.type.vuw Awarded Research Masters Thesis en_NZ Maori Studies en_NZ Victoria University of Wellington en_NZ Master's en_NZ Master of Arts en_NZ
vuwschema.subject.anzsrcfor 200407 Lexicography en_NZ
vuwschema.subject.anzsrcfor 200321 Te Reo Māori (Māori Language) en_NZ
vuwschema.subject.anzsrcseo 970120 Expanding Knowledge in Languages, Communication and Culture en_NZ

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search ResearchArchive

Advanced Search


My Account