Smart Inventory System for Expiry Date Tracking
Volume-7 | Issue-2

AI based Identification of Students Dress Code in Schools and Universities
Volume-6 | Issue-1

Classification of Remote Sensing Image Scenes Using Double Feature Extraction Hybrid Deep Learning Approach
Volume-3 | Issue-2

Exploiting Vulnerabilities in Weak CAPTCHA Mechanisms within DVWA
Volume-7 | Issue-2

Gamification in Mobile Apps: Assessing the Effects on Customer Engagement and Loyalty in the Retail Industry
Volume-5 | Issue-4

Survey: Unconventional Categories of Chatbots that make use of Machine Learning Techniques
Volume-5 | Issue-3

Light Weight CNN based Robust Image Watermarking Scheme for Security
Volume-3 | Issue-2

Investigating Process Scheduling Techniques for Optimal Performance and Energy Efficiency in Operating Systems
Volume-6 | Issue-4

Review on Sanskrit Sandhi Splitting using Deep Learning Techniques
Volume-6 | Issue-2

Getis-Ord (Gi*) based Farmer Suicide Hotspot Detection
Volume-4 | Issue-2

AUTOMATION USING IOT IN GREENHOUSE ENVIRONMENT
Volume-1 | Issue-1

Principle of 6G Wireless Networks: Vision, Challenges and Applications
Volume-3 | Issue-4

Classification of Remote Sensing Image Scenes Using Double Feature Extraction Hybrid Deep Learning Approach
Volume-3 | Issue-2

Light Weight CNN based Robust Image Watermarking Scheme for Security
Volume-3 | Issue-2

VIRTUAL REALITY GAMING TECHNOLOGY FOR MENTAL STIMULATION AND THERAPY
Volume-1 | Issue-1

Design of Digital Image Watermarking Technique with Two Stage Vector Extraction in Transform Domain
Volume-3 | Issue-3

Analysis of Natural Language Processing in the FinTech Models of Mid-21st Century
Volume-4 | Issue-3

PROGRESS AND PRECLUSION OF KNEE OSTEOARTHRITIS: A STUDY
Volume-3 | Issue-3

Image Augmentation based on GAN deep learning approach with Textual Content Descriptors
Volume-3 | Issue-3

Comparative Analysis for Personality Prediction by Digital Footprints in Social Media
Volume-3 | Issue-2

Home / Archives / Volume-6 / Issue-1 / Article-8

Volume - 6 | Issue - 1 | march 2024

TF-IDF Vectorization and Clustering for Extractive Text Summarization Open Access
Muthu Virumeshwaran T  , R Thirumahal  344
Pages: 96-111
Cite this article
T, Muthu Virumeshwaran, and R Thirumahal. "TF-IDF Vectorization and Clustering for Extractive Text Summarization." Journal of Information Technology and Digital World 6, no. 1 (2024): 96-111
Published
29 April, 2024
Abstract

Extractive document summarization is a vital technique for condensing large volumes of text while retaining key information. This research introduces a dynamic feature space mapping approach to enhance extractive document summarization, aiming to succinctly encapsulate key information from extensive text volumes. The proposed method involves extracting various document properties like term frequency, sentence length, and position to comprehensively describe content. By employing a mapping function, these features are projected into a dynamic feature space, enhancing summarization efficiency and feature clarity. Clustering similar phrases in this space facilitates easier sentence grouping, aiding summary creation. Leveraging TF-IDF vectorization, the most representative phrases are chosen from each cluster based on importance and diversity. This process culminates in generating a high-quality document summary quickly and systematically. The dynamic mapping method streamlines sentence grouping, systematically capturing essential document attributes. This approach addresses challenges in extractive summarization, contributing significantly to automated text summarization. Its applicability spans domains requiring rapid extraction of information from vast textual data.

Keywords

Extractive Summarization Dynamic Feature Space Mapping TF-IDF Vectorization K-Means Clustering Document Preprocessing Sentence Clustering Summarization Efficiency Feature Extraction Natural Language Processing Information Retrieval

×

Currently, subscription is the only source of revenue. The subscription resource covers the operating expenses such as web presence, online version, pre-press preparations, and staff wages.

To access the full PDF, please complete the payment process.

Subscription Details

Category Fee
Article Access Charge
15 USD
Open Access Fee Nil
Annual Subscription Fee
200 USD
After payment,
please send an email to irojournals.contact@gmail.com / journals@iroglobal.com requesting article access.
Subscription form: click here