Part of Speech Tagger Using Empirical Evaluation of Neural Word Embedding and N-Gram Approaches for Koorete
Abstract
The Ethiopian low resourced local language - Koorete is spoken by the Koore Zone people, who are located in the southern part of Ethiopia. The language is also used as the medium of instructions beyond 350 thousand Koore nations and some other people beyond its border. The language follows the sentence structure of “Subject (Zeere utaade) + Object (efaxe) + Verb (Hanta beyiisaxe)”. This paper aims to develop Part of Speech (POS) tagger using the empirical evaluation of Neural Word Embedding and N-gram-based approaches for Koorete. According to this scope, neural word embedding represents the sequence of labeling and distribution of words into vectors (Word2Vec) by applying the bidirectional long short-term (Bi-LSTM) recurrent neural network (RNN) memory model. This model also achieved state-of-the-art POS tagging prediction accuracy by comparing it with the classic N-gram frequency prediction since it is triggered in the research question. Demonstration is made to the Bi-LSTM RNN and N-gram model on the same Koorete POS tagger (KPT) manually annotated corpus. This KPT corpus used 1718 sentences with about 33,220 words and divided the corpus into 90% for training and 10% for testing. Finally, the Bi-LSTM RNN word embedding POS tagging approach performed 98.53% for training and 98.49% for testing; whereas, the N-gram POS tagging approach performed about 97.10% for training and 77.29% for testing. These results could lead to the conclusion of Bi-LSTM RNN model performed better than the N-gram model accuracy.
Keywords: POS Tagging, Word2Vec representation, neural word embedding, N-gram, Deep Learning, Koorete languages
DOI: 10.7176/CEIS/17-1-03
Publication date: February 28th, 2026
To list your conference here. Please contact the administrator of this platform.
Paper submission email: CEIS@iiste.org
ISSN (Paper)2222-1727 ISSN (Online)2222-2863
Please add our address "contact@iiste.org" into your email contact list.
This journal follows ISO 9001 management standard and licensed under a Creative Commons Attribution 3.0 License.
Copyright © www.iiste.org
Computer Engineering and Intelligent Systems