Abstract
This research proposes LAWGIC (where law meets logic), an automated legal document classification and retrieval system leveraging topic modelling using Latent Dirichlet Allocation (LDA). LAWGIC utilizes the Indian Kanoon website as its data source, focusing on Supreme Court of India documents. LDA is used to extract meaningful themes from legal documents and assign them to the most relevant topic. This frees legal professionals from manual categorization, improves accuracy, and empowers them to work with greater efficiency. The system also provides a user interface for efficient retrieval and exploration of documents and topics.
References
- Noguti, Mariana Y., Eduardo Vellasques, and Luiz S. Oliveira. "Legal document classification: An application to law area prediction of petitions to public prosecution service." In 2020 International joint conference on neural networks (IJCNN), pp. 1-8. IEEE, 2020.
- Gupta, Sonam, Arun Yadav, Divakar Yadav, and Utkarsh Dixit. "Analysis of Automatic Text Classification of Legal Documents." In Proceedings of the International Conference on Innovative Computing & Communication (ICICC). 2022.
- Bambroo, Purbid, and Aditi Awasthi. "Legaldb: Long distilbert for legal document classification." In 2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), pp. 1-4. IEEE, 2021.
- Giri, Rachayita, Yosha Porwal, Vaibhavi Shukla, Palak Chadha, and Rishabh Kaushal. "Approaches for information retrieval in legal documents." In 2017 Tenth International Conference on Contemporary Computing (IC3), pp. 1-6. IEEE, 2017.
- Li, Zhonghao. "A classification retrieval approach for English legal texts." In 2019 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), pp. 220-223. IEEE, 2019.
- Huber-Fliflet, Nathaniel, Jianping Zhang, Fusheng Wei, Q. Han, Shi Ye, and H. Zhao. "Empirical Comparisons of CNN with Other Learning Algorithms for Text Classification in Legal Document Review." In 2019 IEEE International Big Data Conference. 2019.
- Kovalchuk, Olha, Serhiy Banakh, Mariia Masonkova, Kateryna Berezka, Serhii Mokhun, and Olha Fedchyshyn. "Text mining for the analysis of legal texts." In 2022 12th International Conference on Advanced Computer Information Technologies (ACIT), pp. 502-505. IEEE, 2022.
- Wiratchawa, Kannika, Tanutcha Khunthong, and Thanapong Intharah. "LegalBERT-th: Development of Legal Q&A Dataset and Automatic Question Tagging." In 2021 18th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), pp. 1159-1162. IEEE, 2021.
- Chhatwal, Rishi, Robert Keeling, Peter Gronvall, Nathaniel Huber-Fliflet, Jianping Zhang, and Haozhen Zhao. "CNN application in detection of privileged documents in legal document review." In 2020 IEEE international conference on big data (big data), pp. 1485-1492. IEEE, 2020.
- Mahoney, Christian, Peter Gronvall, Nathaniel Huber-Fliflet, and Jianping Zhang. "Explainable Text Classification Techniques in Legal Document Review: Locating Rationales without Using Human Annotated Training Text Snippets." In 2022 IEEE International Conference on Big Data (Big Data), pp. 2044-2051. IEEE, 2022.
- Wan, Lulu, George Papageorgiou, Michael Seddon, and Mirko Bernardoni. "Long-length legal document classification." arXiv preprint arXiv:1912.06905 (2019).
