Journal of Trends in Computer Science and Smart Technology is accepted for inclusion in Scopus. click here
Home / Archives / Volume-6 / Issue-4 / Article-7

Volume - 6 | Issue - 4 | december 2024

A Practical Application of Retrieval-Augmented Generation for Website-Based Chatbots: Combining Web Scraping, Vectorization, and Semantic Search Open Access
Sangita Pokhrel  , Bina K C., Prashant Bikram Shah  362
Pages: 424-442
Cite this article
Pokhrel, Sangita, Bina K C., and Prashant Bikram Shah. "A Practical Application of Retrieval-Augmented Generation for Website-Based Chatbots: Combining Web Scraping, Vectorization, and Semantic Search." Journal of Trends in Computer Science and Smart Technology 6, no. 4 (2024): 424-442
Published
20 January, 2025
Abstract

The Retrieval-Augmented Generation (RAG) model significantly enhances the capabilities of large language models (LLMs) by integrating information retrieval with text generation, which is particularly relevant for applications requiring context-aware responses based on dynamic data sources. This research study presents a practical implementation of a RAG model personalized for a Chabot that answers user inquiries from various specific websites. The methodology encompasses several key steps: web scraping using BeautifulSoup to extract relevant content, text processing to segment this content into manageable chunks, and vectorization to create embeddings for efficient semantic search. By employing a semantic search approach, the system retrieves the most relevant document segments based on user queries. The OpenAI API is then utilized to generate contextually appropriate responses from the retrieved information. Key results highlight the system's effectiveness in providing accurate and relevant answers, with evaluation metrics centered on response quality, retrieval efficiency, and user satisfaction. This research contributes a comprehensive integration of scraping, vectorization, and semantic search technologies into a cohesive chatbot application, offering valuable insights into the practical implementation of RAG models.

Keywords

Large Language Models Retrieval Augmented Generation (RAG) ChatBot OpenAI API Natural Language Processing

×
Article Processing Charges

Journal of Trends in Computer Science and Smart Technology (jtcsst) is an open access journal. When a paper is accepted for publication, authors are required to pay Article Processing Charges (APCs) to cover its editorial and production costs. The APC for each submission is 400 USD. There are no additional charges based on color, length, figures, or other elements.

Category Fee
Article Access Charge 30 USD
Article Processing Charge 400 USD
Annual Subscription Fee 200 USD
Payment Gateway
Paypal: click here
Townscript: click here
Razorpay: click here
After payment,
please send an email to irojournals.contact@gmail.com / journals@iroglobal.com requesting article access.
Subscription form: click here