Winnowing vs Extended-Winnowing: A Comparative Analysis of Plagiarism Detection Algorithms
PDF

Keywords

Winnowing Algorithm
Extended Winnowing Algorithm
Jaccard Coefficient
Twitter
Python
React

How to Cite

Shrestha, Shiva, Sushan Shakya, and Sandeep Gautam. 2023. “Winnowing Vs Extended-Winnowing: A Comparative Analysis of Plagiarism Detection Algorithms”. Journal of Trends in Computer Science and Smart Technology 5 (3): 213-32. https://doi.org/10.36548/jtcsst.2023.3.001.

Abstract

Plagiarism is the main problem in the digital world, as people use others’ content without giving prior credit to the creator. Therefore, there should be proper and efficient algorithms to find plagiarized content on the Internet. This research proposes two algorithms: the winnowing algorithm and the extended winnowing algorithm. The winnowing algorithm can only calculate the similarity rate between documents, whereas the extended algorithm can mark the plagiarized text segment in the compared records along with their similarity rates. The similarity rate in both algorithms has been calculated using the Jaccard Coefficient. Although the extended algorithm is beneficial as it provides a text marking feature, it consumes more computation power, which is discussed in this study. There are research works done previously using this approach, but none has compared the algorithms’ performance on small texts. Thus, this research utilizes the Twitter form of data to test these algorithms’ performance, as it contains a maximum of 280 characters. The application proposed to detect plagiarism in tweets has been developed using Python as the backend and React as the front-end technology.

PDF

References

Plagiarism | University of Oxford. (n.d.). Retrieved from https://www.ox.ac.uk/students/academic/guidance/skills/plagiarism/

Ulinnuha, N., Thohir, M., Novitasari, D. C. R., Asyhar, A. H., & Arifin, A. Z. (2018). Implementation of winnowing algorithm for document plagiarism detection. Proceeding of EECSI, 631-636.

Number of worldwide social network users 2027 | Statista. (2023, February 13). Retrieved from https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/]

Mason, S., & Singh, L. (2022). Reporting and discoverability of “Tweets” quoted in published scholarship: current practice and ethical implications. Research Ethics, 18(2), 93–113. https://doi.org/10.1177/17470161221076948

Schleimer, S., Wilkerson, D. S., & Aiken, A. (2003, June). Winnowing: local algorithms for document fingerprinting. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data (pp. 76-85).

Duan, X., Wang, M., & Mu, J. (2017). A plagiarism detection algorithm based on extended winnowing. In MATEC Web of Conferences (Vol. 128, p. 02019). EDP Sciences.

Haryadi, D. (2012). Implementasi Algoritma Winnowing dengan Tahapan Preprocessing pada Aplikasi Pendeteksi Plagiarisme Dokumen Teks. Undergraduate, Universitas Multimedia Nusantara.

Shrestha, S., Gautam, S., Sharma, K. & Bhandari, A. (2023). Winnowing Algorithm: A Powerful Tool for Identifying Plagiarism in Assignments. Journal of Trends in Computer Science and Smart Technology, 5(2), 168-189. doi:10.36548/jtcsst.2023.2.006

H. Jiang and S. -J. Lin, "A Rolling Hash Algorithm and the Implementation to LZ4 Data Compression," in IEEE Access, vol. 8, pp. 35529-35534, 2020, doi: 10.1109/ACCESS.2020.2974489.

Hasan, E. G., Wicaksana, A., & Hansun, S. (2018, June). The implementation of winnowing algorithm for plagiarism detection in Moodle-based e-learning. In 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS) (pp. 321-325). IEEE.

Ridho, M. (2013). Rancang Bangun Aplikasi Pendeteksi Penjiplakan Dokumen Menggunakan Algoritma Biword Winnowing (Doctoral dissertation, UNIVERSITAS ISLAM NEGERI SULTAN SYARIEF KASIM RIAU).

Arnaboldi, V., Passarella, A., Conti, M., & Dunbar, R. I. (2015). Online social networks: human cognitive constraints in Facebook and Twitter personal graphs. Elsevier.

Jaccard Similarity. (n.d.). Retrieved from https://www.learndatasci.com/glossary/jaccard-similarity/

Tung, K. T., Hung, N. D., & Hanh, L. T. M. (2015). A Comparison of Algorithms used to measure the Similarity between two documents. Int. J. Adv. Res. Comput. Eng. Technol., no.

S. T. Demirel and R. Das, "Software requirement analysis: Research challenges and technical approaches," 2018 6th International Symposium on Digital Forensic and Security (ISDFS), Antalya, Turkey, 2018, pp. 1-6, doi: 10.1109/ISDFS.2018.8355322