A Practical Application of Retrieval-Augmented Generation for Website-Based Chatbots: Combining Web Scraping, Vectorization, and Semantic Search

Pokhrel, Sangita; C., Bina K; Shah, Prashant Bikram

Volume - 6 | Issue - 4 | december 2024

A Practical Application of Retrieval-Augmented Generation for Website-Based Chatbots: Combining Web Scraping, Vectorization, and Semantic Search Open Access

Sangita Pokhrel , Bina K C., Prashant Bikram Shah 414

Pages: 424-442

Full Article PDF

Cite this article

Pokhrel, Sangita, Bina K C., and Prashant Bikram Shah. "A Practical Application of Retrieval-Augmented Generation for Website-Based Chatbots: Combining Web Scraping, Vectorization, and Semantic Search." Journal of Trends in Computer Science and Smart Technology 6, no. 4 (2024): 424-442

DOI

10.36548/jtcsst.2024.4.007

Published

20 January, 2025

Abstract

The Retrieval-Augmented Generation (RAG) model significantly enhances the capabilities of large language models (LLMs) by integrating information retrieval with text generation, which is particularly relevant for applications requiring context-aware responses based on dynamic data sources. This research study presents a practical implementation of a RAG model personalized for a Chabot that answers user inquiries from various specific websites. The methodology encompasses several key steps: web scraping using BeautifulSoup to extract relevant content, text processing to segment this content into manageable chunks, and vectorization to create embeddings for efficient semantic search. By employing a semantic search approach, the system retrieves the most relevant document segments based on user queries. The OpenAI API is then utilized to generate contextually appropriate responses from the retrieved information. Key results highlight the system's effectiveness in providing accurate and relevant answers, with evaluation metrics centered on response quality, retrieval efficiency, and user satisfaction. This research contributes a comprehensive integration of scraping, vectorization, and semantic search technologies into a cohesive chatbot application, offering valuable insights into the practical implementation of RAG models.

Keywords

Large Language Models Retrieval Augmented Generation (RAG) ChatBot OpenAI API Natural Language Processing

Category	Fee
Article Access Charge	30 USD
Article Processing Charge	400 USD
Annual Subscription Fee	200 USD

Volume - 6 | Issue - 4 | december 2024

Sangita Pokhrel

DOI

10.36548/jtcsst.2024.4.007

Published

20 January, 2025

e-ISSN: 2582-4104
4 issues per year
DOI: https://doi.org/10.36548/jtcsst

Indexing
Scopus | GoogleScholar | Crossref | MicrosoftAcademic | ScienceGate | J-Gate

Publisher

Inventive Research Organization

Open Access Journal

Volume - 6 | Issue - 4 | december 2024

Sangita Pokhrel

DOI

10.36548/jtcsst.2024.4.007

Published

20 January, 2025

e-ISSN: 2582-4104 4 issues per year DOI: https://doi.org/10.36548/jtcsst

Indexing Scopus | GoogleScholar | Crossref | MicrosoftAcademic | ScienceGate | J-Gate

Publisher Inventive Research Organization

Open Access Journal

e-ISSN: 2582-4104
4 issues per year
DOI: https://doi.org/10.36548/jtcsst

Indexing
Scopus | GoogleScholar | Crossref | MicrosoftAcademic | ScienceGate | J-Gate

Publisher

Inventive Research Organization