Smart Inventory System for Expiry Date Tracking
Volume-7 | Issue-2

Deep Fake Images and Videos Detection using Deep Learning
Volume-7 | Issue-2

Exploiting Vulnerabilities in Weak CAPTCHA Mechanisms within DVWA
Volume-7 | Issue-2

A Review on Cryptocurrency and its Advancements in Present World
Volume-4 | Issue-4

Investigating Process Scheduling Techniques for Optimal Performance and Energy Efficiency in Operating Systems
Volume-6 | Issue-4

AI-Powered Data Interaction: A Natural Language Chatbot for CSV, Excel, and SQL Files
Volume-7 | Issue-1

Navigating the Cloud: Security, Compliance, and Risk Challenges in SME Adoption
Volume-7 | Issue-3

Edge Computing Research – A Review
Volume-5 | Issue-1

Gamification in Mobile Apps: Assessing the Effects on Customer Engagement and Loyalty in the Retail Industry
Volume-5 | Issue-4

AI based Identification of Students Dress Code in Schools and Universities
Volume-6 | Issue-1

AUTOMATION USING IOT IN GREENHOUSE ENVIRONMENT
Volume-1 | Issue-1

Principle of 6G Wireless Networks: Vision, Challenges and Applications
Volume-3 | Issue-4

Classification of Remote Sensing Image Scenes Using Double Feature Extraction Hybrid Deep Learning Approach
Volume-3 | Issue-2

Light Weight CNN based Robust Image Watermarking Scheme for Security
Volume-3 | Issue-2

VIRTUAL REALITY GAMING TECHNOLOGY FOR MENTAL STIMULATION AND THERAPY
Volume-1 | Issue-1

Design of Digital Image Watermarking Technique with Two Stage Vector Extraction in Transform Domain
Volume-3 | Issue-3

Analysis of Natural Language Processing in the FinTech Models of Mid-21st Century
Volume-4 | Issue-3

PROGRESS AND PRECLUSION OF KNEE OSTEOARTHRITIS: A STUDY
Volume-3 | Issue-3

Image Augmentation based on GAN deep learning approach with Textual Content Descriptors
Volume-3 | Issue-3

Comparative Analysis for Personality Prediction by Digital Footprints in Social Media
Volume-3 | Issue-2

Home / Archives / Volume-7 / Issue-4 / Article-2
Data Workflow Acceleration: A Smart System for Redundancy Elimination in Machine Learning Pipelines
Ahmed Sarwar Mohammed 
Open Access
Volume - 7 • Issue - 4 • december 2025
271-282  68 pdf-white-icon PDF
Abstract

This paper presents a novel framework designed to significantly accelerate these pipelines. By establishing granular data provenance and implementing intelligent reuse strategies, our system efficiently identifies and eliminates redundant computations. This approach tackles key challenges such as managing extensive data traces and accommodating non-deterministic operations through advanced duplication and hierarchical reuse techniques. Our framework seamlessly integrates with existing data processing environments, demonstrating substantial efficiency improvements and fostering faster iterative development cycles for data professionals.

Cite this article
Mohammed, Ahmed Sarwar. "Data Workflow Acceleration: A Smart System for Redundancy Elimination in Machine Learning Pipelines." Journal of Information Technology and Digital World 7, no. 4 (2025): 271-282. doi: 10.36548/jitdw.2025.4.002
Copy Citation
Mohammed, A. S. (2025). Data Workflow Acceleration: A Smart System for Redundancy Elimination in Machine Learning Pipelines. Journal of Information Technology and Digital World, 7(4), 271-282. https://doi.org/10.36548/jitdw.2025.4.002
Copy Citation
Mohammed, Ahmed Sarwar "Data Workflow Acceleration: A Smart System for Redundancy Elimination in Machine Learning Pipelines." Journal of Information Technology and Digital World, vol. 7, no. 4, 2025, pp. 271-282. DOI: 10.36548/jitdw.2025.4.002.
Copy Citation
Mohammed AS. Data Workflow Acceleration: A Smart System for Redundancy Elimination in Machine Learning Pipelines. Journal of Information Technology and Digital World. 2025;7(4):271-282. doi: 10.36548/jitdw.2025.4.002
Copy Citation
A. S. Mohammed, "Data Workflow Acceleration: A Smart System for Redundancy Elimination in Machine Learning Pipelines," Journal of Information Technology and Digital World, vol. 7, no. 4, pp. 271-282, Dec. 2025, doi: 10.36548/jitdw.2025.4.002.
Copy Citation
Mohammed, A.S. (2025) 'Data Workflow Acceleration: A Smart System for Redundancy Elimination in Machine Learning Pipelines', Journal of Information Technology and Digital World, vol. 7, no. 4, pp. 271-282. Available at: https://doi.org/10.36548/jitdw.2025.4.002.
Copy Citation
@article{mohammed2025,
  author    = {Ahmed Sarwar Mohammed},
  title     = {{Data Workflow Acceleration: A Smart System for Redundancy Elimination in Machine Learning Pipelines}},
  journal   = {Journal of Information Technology and Digital World},
  volume    = {7},
  number    = {4},
  pages     = {271-282},
  year      = {2025},
  publisher = {Inventive Research Organization},
  doi       = {10.36548/jitdw.2025.4.002},
  url       = {https://doi.org/10.36548/jitdw.2025.4.002}
}
Copy Citation
Keywords
Data Provenance Redundant Computation Elimination Deduplication Hierarchical Reuse Non- Deterministic Operations Pipeline Optimization Data Processing Frameworks Computational Efficiency Intelligent Reuse Strategies Iterative Development Acceleration
References
  1. Zaharia, Matei, Andrew Chen, Aaron Davidson, Ali Ghodsi, Sue Ann Hong, Andy Konwinski, Siddharth Murching et al. "Accelerating the Machine Learning Lifecycle with MLflow." IEEE Data Eng. Bull. 41, no. 4 (2018): 39-45.
  2. Liam Li, Evan Sparks, Kevin Jamieson, Ameet Talwalkar, “Exploiting Reuse in Pipeline-Aware Hyperparameter Tuning,” in Proceedings of https://arxiv.org/pdf/1903.05176
  3. Xin, Reynold S., Daniel Crankshaw, Ankur Dave, Joseph E. Gonzalez, Michael J. Franklin, and Ion Stoica. "Graphx: Unifying Data-Parallel and Graph-Parallel Analytics." arXiv preprint arXiv:1402.2394 (2014).
  4. Ratner, Alexander, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. "Snorkel: Rapid Training Data Creation with Weak Supervision." In Proceedings of the VLDB endowment. International conference on very large data bases, vol. 11, no. 3, 2017, 269.
  5. Meyer, Frank. "Recommender Systems in Industrial Contexts." arXiv preprint arXiv:1203.4487 (2012).
  6. Vassiliadis, Vassilis, Michael A. Johnston, and James L. McDonagh. "Fast, Transparent, and High-Fidelity Memoization Cache-Keys for Computational Workflows." In 2022 IEEE International Conference on Services Computing (SCC), IEEE, 2022, 174-184.
  7. Baylor, Denis, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal et al. "Tfx: A Tensorflow-Based Production-Scale Machine Learning Platform." In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017, 1387-1395.
  8. George, Johnu, and Amit Saha. "End-to-End Machine Learning Using Kubeflow." In Proceedings of the 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD), 2022, 336-338.
  9. Stevens, Kevin, Mert Erdemir, Hang Zhang, Taesoo Kim, and Paul Pearce. "BluePrint: Automatic Malware Signature Generation for Internet Scanning." In Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses, 2024, 97-214.
  10. Liu, Jie, Bogdan Nicolae, Dong Li, Justin M. Wozniak, Tekin Bicer, Zhengchun Liu, and Ian Foster. "Large Scale Caching and Streaming of Training Data for Online Deep Learning." In Proceedings of the 12th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures, 2022, 19-26.
  11. Cafarella, Michael J., and Christopher Ré. "Manimal: Relational optimization for Data-Intensive Programs." In Procceedings of the 13th International Workshop on the Web and Databases, 2010, 1-6.
  12. Domhan, Tobias, Jost Tobias Springenberg, and Frank Hutter. "Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves." In IJCAI, vol. 15, 2015, 3460-8.
  13. Donadio, Matteo. "Declarative Data Pipelines: Implementing A Logical Model Through Automated Code Generation." PhD diss., Politecnico di Torino, 2024.
  14. McKinney, Wes. "Data structures for Statistical Computing in Python." scipy 445, no. 1 (2010): 51-56.
  15. Gu, Rong, Zhihao Xu, Yang Che, Xu Wang, Haipeng Dai, Kai Zhang, Bin Fan et al. "High-Level Data Abstraction and Elastic Data Caching for Data-Intensive AI Applications on Cloud-Native Platforms." IEEE Transactions on Parallel and Distributed Systems 34, no. 11 (2023): 2946-2964.
  16. S. Foundation, “Apache Airflow Documentation,” https://airflow.apache.org/docs/, 2021.
  17. Akidau, Tyler, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J. Fernández-Moctezuma, Reuven Lax, Sam McVeety et al. "The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing." Proceedings of the VLDB Endowment 8, no. 12 (2015): 1792-1803.
  18. https://www.kubeflow.org/docs/components/pipelines/concepts/pipeline/.
Published
17 December, 2025
×

Currently, subscription is the only source of revenue. The subscription resource covers the operating expenses such as web presence, online version, pre-press preparations, and staff wages.

To access the full PDF, please complete the payment process.

Subscription Details

Category Fee
Article Access Charge
15 USD
Open Access Fee Nil
Annual Subscription Fee
200 USD
After payment,
please send an email to irojournals.contact@gmail.com / journals@iroglobal.com requesting article access.
Subscription form: click here