Evaluating Random Forest and Decision Tree Algorithms for Resolving Lexical Ambiguity in the Gujarati Language

Avani N Dave; Sanjay M Shah; Nakul R Dave

doi:10.36548/jtcsst.2025.4.006

Evaluating Random Forest and Decision Tree Algorithms for Resolving Lexical Ambiguity in the Gujarati Language

Avani N Dave , Sanjay M Shah, Nakul R Dave

Open Access

Volume - 7 • Issue - 4 • december 2025

https://doi.org/10.36548/jtcsst.2025.4.006

727-752 252 PDF

Abstract

Word sense disambiguation is the task of determining the exact meaning of a word based on its context. This task is crucial in natural language processing. The lack of labeled datasets and the complex structure of the language, which includes idiomatic usage and subtle semantic changes, contribute to the poor outcomes of earlier attempts to solve word sense disambiguation in Gujarati. As a result, various models have shown low accuracy. To address this issue, we have created a new dataset that is manually sense-annotated for unclear Gujarati words. The corpus contains 50 ambiguous words, and each word has been assigned to the appropriate context. This makes it a valuable starting point for evaluating supervised learning models. With this newly compiled corpus, we carry out a systematic study of two supervised machine learning algorithms-Decision Tree and Random Forest-using 3-fold and 5-fold cross-validation. Our results show that Random Forest obtains the highest accuracy, highlighting which supervised methods are best suited for this particular task. The main contributions of this work include the development of a much-needed annotated corpus and sufficient evidence to prove that supervised learning can be quite effective in improving WSD for Gujarati when proper data is integrated.

Cite this article

Chicago APA MLA Vancouver IEEE Harvard BibTeX

Dave, Avani N, Sanjay M Shah, and Nakul R Dave. "Evaluating Random Forest and Decision Tree Algorithms for Resolving Lexical Ambiguity in the Gujarati Language." Journal of Trends in Computer Science and Smart Technology 7, no. 4 (2025): 727-752. doi: 10.36548/jtcsst.2025.4.006

Copy Citation

Dave, A. N., Shah, S. M., & Dave, N. R. (2025). Evaluating Random Forest and Decision Tree Algorithms for Resolving Lexical Ambiguity in the Gujarati Language. Journal of Trends in Computer Science and Smart Technology, 7(4), 727-752. https://doi.org/10.36548/jtcsst.2025.4.006

Copy Citation

Dave, Avani N, et al. "Evaluating Random Forest and Decision Tree Algorithms for Resolving Lexical Ambiguity in the Gujarati Language." Journal of Trends in Computer Science and Smart Technology, vol. 7, no. 4, 2025, pp. 727-752. DOI: 10.36548/jtcsst.2025.4.006.

Copy Citation

Dave AN, Shah SM, Dave NR. Evaluating Random Forest and Decision Tree Algorithms for Resolving Lexical Ambiguity in the Gujarati Language. Journal of Trends in Computer Science and Smart Technology. 2025;7(4):727-752. doi: 10.36548/jtcsst.2025.4.006

Copy Citation

A. N. Dave, S. M. Shah, and N. R. Dave, "Evaluating Random Forest and Decision Tree Algorithms for Resolving Lexical Ambiguity in the Gujarati Language," Journal of Trends in Computer Science and Smart Technology, vol. 7, no. 4, pp. 727-752, Dec. 2025, doi: 10.36548/jtcsst.2025.4.006.

Copy Citation

Dave, A.N., Shah, S.M. and Dave, N.R. (2025) 'Evaluating Random Forest and Decision Tree Algorithms for Resolving Lexical Ambiguity in the Gujarati Language', Journal of Trends in Computer Science and Smart Technology, vol. 7, no. 4, pp. 727-752. Available at: https://doi.org/10.36548/jtcsst.2025.4.006.

Copy Citation

@article{dave2025,
  author    = {Avani N Dave and Sanjay M Shah and Nakul R Dave},
  title     = {{Evaluating Random Forest and Decision Tree Algorithms for Resolving Lexical Ambiguity in the Gujarati Language}},
  journal   = {Journal of Trends in Computer Science and Smart Technology},
  volume    = {7},
  number    = {4},
  pages     = {727-752},
  year      = {2025},
  publisher = {IRO Journals},
  doi       = {10.36548/jtcsst.2025.4.006},
  url       = {https://doi.org/10.36548/jtcsst.2025.4.006}
}

Copy Citation

Keywords

Machine Learning Word Sense Disambiguation Natural Language Processing Decision Tree Random Forest Sense Annotated Corpus Gujarati Language

References

Dave, Avani N., and Sanjay M. Shah. "Comprehensive Analysis for Assessing the Effectiveness in the Implementation of Word Sense Disambiguation in the Gujarati Language." In IET Conference Proceedings CP920, vol. 2025, no. 7, Stevenage, UK: The Institution of Engineering and Technology, 2025, 1093-1100.
Navigli, Roberto. "Word Sense Disambiguation: A Survey." ACM computing surveys (CSUR) 41, no. 2 (2009): 1-69.
Gujjar, Vinto, Neeru Mago, Raj Kumari, Shrikant Patel, Nalini Chintalapudi, and Gopi Battineni. "A Literature Survey on Word Sense Disambiguation for the Hindi Language." Information 14, no. 9 (2023): 495.
Escudero Bakx, Gerard. "Machine Learning Techniques for Word Sense Disambiguation." PhD diss., Universitat Politècnica de Catalunya (UPC), 2006.
Dave, Nakul R., and Mayuri A. Mehta. "Comparative Analysis of Rule-Based, Dictionary-Based and Hybrid Stemmers for Gujarati Language." In International Conference on Big Data Analytics, Cham: Springer International Publishing, 2019, 140-155.
Sinha, M., Kumar, M., Pande, P., Kashyap, L., & Bhattacharyya, P. (2004). Hindi Word Sense Disambiguation. International Symposium on Machine Translation, Natural Language Processing and Translation Support Systems, 1–7. https://api.semanticscholar.org/CorpusID:9438332
Abraham, Ajith, Bineet Kumar Gupta, Satya Bhushan Verma, Archana Sachindeo Maurya, Mohammad Husain, Arshad Ali, Sami Alshmrany, and Sanjay Gupta. "Improvement of Translation Accuracy for the Word Sense Disambiguation System using Novel Classifier Approach." International Arab Journal of Information Technology (IAJIT) 21, no. 6 (2024).
Zhou, Xiaohua, and Hyoil Han. "Survey of Word Sense Disambiguation Approaches." In FLAIRS, 2005, 307-313.
Hao, Lianwang, Tao Zhang, and Huaixin Liang. "A Novel Rotational Causal Random Forest Approach for Word Sense Disambiguation of English Modal Verbs." Available at SSRN 5244147.
Vasoya, Parth J., and T. Vyas. "A Survey on Word Sense Disambiguation Approaches." International journal of Trend in Research and Development 1 (2014): 1-3.
Rekha, Smt. V. and Manish Shah. “A Study of Main Elements of Word Sense Disambiguation (WSD) for Gujarati Language.” (2016).
Sheth, Mitul, Shivang Popat, and Tarjni Vyas. "Word Sense Disambiguation for Indian Languages." In International conference on emerging research in computing, information, communication and applications, Singapore: Springer Singapore, 2016, 583-593.
Vyas, Tarjni, and Amit Ganatra. "Gujarati Language: Research Issues, Resources and Proposed Method on Word Sense Disambiguation." Int. J. Recent Technol. Eng.(IJRTE) ISSN 8, no. 2 suppl 11 (2019): 2277-3878.
Geleta, Tabor Wegi, and Jara Muda Haro. "Semisupervised Learning‐Based Word‐Sense Disambiguation Using Word Embedding for Afaan Oromoo Language." Applied Computational Intelligence and Soft Computing 2024, no. 1 (2024): 4429069.
Borah, Pranjal Protim, Gitimoni Talukdar, and Arup Baruah. "Assamese Word Sense Disambiguation Using Supervised Learning." In 2014 International Conference on Contemporary Computing and Informatics (IC3I), IEEE, 2014, 946-950.
Wang, Tinghua, Junyang Rao, and Qi Hu. "Supervised Word Sense Disambiguation Using Semantic Diffusion Kernel." Engineering Applications of Artificial Intelligence 27 (2014): 167-174.
Sarmah, Jumi, and Shikhar Kr Sarma. "Decision Tree Based Supervised Word Sense Disambiguation for Assamese." Int. J. Comput. Appl 141, no. 1 (2016): 42-48.
Kokane, Chandrakant D., and Sachin D. Babar. "Supervised Word Sense Disambiguation with Recurrent Neural Network Model." Int. J. Eng. Adv. Technol.(IJEAT) 9, no. 2 (2019).
Lai, Huei-Ling, Hsiao-Ling Hsu, Jyi-Shane Liu, Chia-Hung Lin, and Yanhong Chen. "Supervised Word Sense Disambiguation on Polysemy with Neural Network Models: A Case Study of BUN in Taiwan Hakka." International Journal of Asian Language Processing 30, no. 03 (2020): 2050011.
Saif, Abdulgabbar, Nazlia Omar, Ummi Zakiah Zainodin, and Mohd Juziaddin Ab Aziz. "Building Sense Tagged Corpus Using Wikipedia for Supervised Word Sense Disambiguation." Procedia Computer Science 123 (2018): 403-412.
Vyas, Tarjni, and Amit Ganatra. "Gujarati Language Model: Word Sense Disambiguation Using Supervised Technique." Int. J. Rec. Technol. Eng 8, no. 2 (2019): 3740-3744.
Kumar, Sailendra, and Rakesh Kumar. "Word Sense Disambiguation in the Hindi Language: Neural Network Approach." Int. J. Tech. Res. Sci 1 (2021): 72-76.
Preeti, B. "Word Sense Disambiguation in Gujarati Language." Int J Innov Res Comput Sci Technol (IJIRCST) 3, no. 1 (2015): 44-47.
Le, Anh-Cuong, Akira Shimazu, Van-Nam Huynh, and Le-Minh Nguyen. "Semi-Supervised Learning Integrated with Classifier Combination for Word Sense Disambiguation." Computer Speech & Language 22, no. 4 (2008): 330-345.
Taghipour, Kaveh, and Hwee Tou Ng. "Semi-Supervised Word Sense Disambiguation Using Word Embeddings in General and Specific Domains." In Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies, 2015, 314-323.
Rahman, Nazreena, and Bhogeswar Borah. "An Unsupervised Method for Word Sense Disambiguation." Journal of King Saud University-Computer and Information Sciences 34, no. 9 (2022): 6643-6651.
Klapaftis, Ioannis P., and Suresh Manandhar. "Unsupervised Word Sense Disambiguation Using The WWW." Frontiers in Artificial Intelligence and Applications 142 (2006): 174.
Martinez-Gil, Jorge. "Context-Aware Semantic Similarity Measurement for Unsupervised Word Sense Disambiguation." arXiv preprint arXiv:2305.03520 (2023).
Kwon, Sunjae, Rishabh Garodia, Minhwa Lee, Zhichao Yang, and Hong Yu. "Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporatin Gloss Information." arXiv preprint arXiv:2305.01788 (2023).
Pal, Alok Ranjan, Anirban Kundu, Abhay Singh, Raj Shekhar, and Kunal Sinha. "A Hybrid Approach to Word Sense Disambiguation Combining Supervised and Unsupervised Learning." arXiv preprint arXiv:1611.01083 (2015).
Specia, Lucia. "A Hybrid Model for Word Sense Disambiguation in English-Portuguese Machine Translation." In Proceedings of the 8th Research Colloquium of the UK Special interest Group in Computational Linguistics, pp. 71-78. 2005.
Vaishnav, Zankhana B., and Priti S. Sajja. "Knowledge-Based Approach for Word Sense Disambiguation Using Genetic Algorithm for Gujarati." In Information and Communication Technology for Intelligent Systems: Proceedings of ICTIS 2018, Volume 1, Singapore: Springer Singapore, 2018, 485-494.
Vaishnav, Zankhana B. "Gujarati Word Sense Disambiguation Using Genetic Algorithm." International Journal on Recent and Innovation Trends in Computing and Communication 5, no. 6 (2017): 635-639.
Rawat, Sunita, K. Kalambe, G. Kawade, and N. Korde. "Supervised Word Sense Disambiguation Using Decision Tree." International Journal of Recent Technology and Engineering (IJRTE) 8, no. 2 (2019): 4043-4047.

Category	Fee
Article Access Charge	30 USD
Article Processing Charge	400 USD
Annual Subscription Fee	200 USD

Evaluating Random Forest and Decision Tree Algorithms for Resolving Lexical Ambiguity in the Gujarati Language

Avani N Dave

Published

02 December, 2025

e-ISSN: 2582-4104
4 issues per year
DOI: https://doi.org/10.36548/jtcsst

Indexing
Scopus | GoogleScholar | Crossref | MicrosoftAcademic | ScienceGate | J-Gate

Publisher

Inventive Research Organization

Open Access Journal

Evaluating Random Forest and Decision Tree Algorithms for Resolving Lexical Ambiguity in the Gujarati Language

Avani N Dave

Published

02 December, 2025

e-ISSN: 2582-4104 4 issues per year DOI: https://doi.org/10.36548/jtcsst

Indexing Scopus | GoogleScholar | Crossref | MicrosoftAcademic | ScienceGate | J-Gate

Publisher Inventive Research Organization

Open Access Journal

e-ISSN: 2582-4104
4 issues per year
DOI: https://doi.org/10.36548/jtcsst

Indexing
Scopus | GoogleScholar | Crossref | MicrosoftAcademic | ScienceGate | J-Gate

Publisher

Inventive Research Organization