Abstract
In this paper, an adaptive gated multimodal fusion framework for generalized and robust multimodal Human Activity Recognition (HAR) systems with heterogeneous sensing modalities like skeletal pose estimation and inertial measurement units (IMUs) is proposed. The proposed framework employs deterministic data harmonization through anterior harmonization and a reliability-based gated multimodal fusion mechanism to improve the robustness and generalization capability of multimodal HAR systems. The proposed gated multimodal fusion mechanism has been mathematically derived to approximate the inverse-variance weighting mechanism to obtain stability in the presence of modality-dependent noise and avoid posterior domain adaptation techniques. To improve temporal alignment between multimodal data streams, frequency domain analysis has been used to justify resampling at a unified 30 Hz rate to meet the Nyquist criterion. The proposed framework has been evaluated using the NTU RGB+D 120, UTD-MHAD, and PAMAP2 datasets with statistically significant results over static baselines (p<0.05, d=2.1), and low computational costs to meet edge-constrained IoT sensing requirements.
References
- Baños Legrán, Oresti, Mate Attila Toth, Miguel Damas Hermoso, Héctor Emilio Pomares Cintas, and Ignacio Rojas Ruiz. "Dealing with the Effects of Sensor Displacement in Wearable Activity Recognition." (2014). 9995–10023.
- Bianchi, Valentina, Marco Bassoli, Gianfranco Lombardo, Paolo Fornacciari, Monica Mordonini, and Ilaria De Munari. "IoT Wearable Sensor and Deep Learning: An Integrated Approach for Personalized Human Activity Recognition in a Smart Home Environment." IEEE Internet of Things Journal 6, no. 5 (2019): 8553-8562.
- Bijalwan, Vishwanath, Abdul Manan Khan, Hangyeol Baek, Sangmin Jeon, and Youngshik Kim. "Interpretable Human Activity Recognition with Temporal Convolutional Networks and Model-Agnostic Explanations." IEEE Sensors Journal 24, no. 17 (2024): 27607-27617.
- Brinzea, Razvan, Bulat Khaertdinov, and Stylianos Asteriadis. "Contrastive Learning with Cross-Modal Knowledge Mining for Multimodal Human Activity Recognition." In 2022 International Joint Conference on Neural Networks (IJCNN), IEEE, 2022, 01-08.
- Chen, Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. "UTD-MHAD: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera and a Wearable Inertial Sensor." In 2015 IEEE International conference on image processing (ICIP), IEEE, 2015, 168-172.
- Chen, Kaixuan, Dalin Zhang, Lina Yao, Bin Guo, Zhiwen Yu, and Yunhao Liu. "Deep Learning for Sensor-Based Human Activity Recognition: Overview, Challenges, and Opportunities." ACM Computing Surveys (CSUR) 54, no. 4 (2021): 1-40.
- Chi, Hyung-gun, Myoung Hoon Ha, Seunggeun Chi, Sang Wan Lee, Qixing Huang, and Karthik Ramani. "Infogcn: Representation Learning for Human Skeleton-Based Action Recognition." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, 20186-20196.
- Dhekane, Sourish Gunesh, and Thomas Ploetz. "Transfer Learning in Sensor-Based Human Activity Recognition: A Survey." ACM Computing Surveys 57, no. 8 (2025): 1-39.
- Dickens, James, and Pierre Payeur. "Multi-Modal Human Action Segmentation Using Skeletal Video Ensembles." Engineering Proceedings 58, no. 1 (2023): 30.
- Dong, Hao, Moru Liu, Kaiyang Zhou, Eleni Chatzi, Juho Kannala, Cyrill Stachniss, and Olga Fink. "Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models." IEEE Transactions on Pattern Analysis and Machine Intelligence (2026). 5672-5691.
- Duan, H., Zhao, Y., Chen, K., Lin, D., & Dai, B. (2022). Revisiting Skeleton-based Action Recognition. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2959–2968. https://doi.org/10.1109/cvpr52688.2022.00298
- Geng, Xinyang, Hao Liu, Lisa Lee, Dale Schuurmans, Sergey Levine, and Pieter Abbeel. "Multimodal Masked Autoencoders Learn Transferable Representations." arXiv preprint arXiv:2205.14204 (2022).
- Guo, Pengyu, and Masaya Nakayama. "Towards User-Generalizable Wearable-Sensor-Based Human Activity Recognition: A Multi-Task Contrastive Learning Approach." Sensors 25, no. 22 (2025): 6988.
- Huang, Sipeng, Yang Chen, Dingchao Wu, Guangwei Yu, and Yong Zhang. "Few-Shot Learning for Human Activity Recognition Based on CSI." In 2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML), IEEE, 2022, 403-409.
- Ijaz, Momal, Renato Diaz, and Chen Chen. "Multimodal Transformer for Nursing Activity Recognition." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, 2065-2074.
- Jiang, Wenchao, and Zhaozheng Yin. "Human Activity Recognition Using Wearable Sensors by Deep Convolutional Neural Networks." In Proceedings of the 23rd ACM international conference on Multimedia, 2015, 1307-1310.
- Khan, Samee Ullah, Maryam Sultana, Sufyan Danish, Deepak Gupta, Norah Saleh Alghamdi, Suchang Woo, Dong-Gyu Lee, and Sangtae Ahn. "Multimodal Feature Fusion for Human Activity Recognition Using Human Centric Temporal Transformer." Engineering Applications of Artificial Intelligence 160 (2025): 111844.
- Kingma, Diederik P., and Jimmy Ba. "Adam: A Method for Stochastic Optimization." arXiv preprint arXiv:1412.6980 (2014). https://doi.org/10.48550/arxiv.1412.6980
- Le, Trung-Hieu, Thai-Khanh Nguyen, Trung-Kien Tran, Thanh-Hai Tran, and Cuong Pham. "Gaformer: Wearable Imu-Based Human Activity Recognition with Gramian Angular Field and Transformer." In 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), IEEE, 2023, 297-303.
- Lee, James, and Suk-ju Kang. "Skeleton Action Recognition Using Two-Stream Adaptive Graph Convolutional Networks." In 2021 36th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC), IEEE, 2021, 1-3.
- Liu, Jiayang, Lin Zhong, Jehan Wickramasuriya, and Venu Vasudevan. "uWave: Accelerometer-Based Personalized Gesture Recognition and Its Applications." Pervasive and Mobile Computing 5, no. 6 (2009): 657-675.
- Liu, Jun, Amir Shahroudy, Mauricio Perez, Gang Wang, Ling-Yu Duan, and Alex C. Kot. "Ntu rgb+ d 120: A Large-Scale Benchmark For 3D Human Activity Understanding." IEEE transactions on pattern analysis and machine intelligence 42, no. 10 (2019): 2684-2701.
- Luo, Jinzhao, Lu Zhou, Guibo Zhu, Guojing Ge, Beiying Yang, and Jinqiao Wang. "Temporal-Channel Topology Enhanced Network for Skeleton-Based Action Recognition." In Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Singapore: Springer Nature Singapore, 2023, 109-119.
- Mazzia, Vittorio, Simone Angarano, Francesco Salvetti, Federico Angelini, and Marcello Chiaberge. "Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition." Pattern Recognition 124 (2022): 108487.
- Miah, Abu Saleh Musa, Yong Seok Hwang, and Jungpil Shin. "Sensor-Based Human Activity Recognition Based on Multi-Stream Time-Varying Features with Eca-Net Dimensionality Reduction." IEEE Access 12 (2024): 151649-151668.
- Miao, Shenghuan, and Ling Chen. "Goat: A Generalized Cross-Dataset Activity Recognition Framework with Natural Language Supervision." Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 8, no. 4 (2024): 1-28.
- Ordóñez, Francisco Javier, and Daniel Roggen. "Deep Convolutional and Lstm Recurrent Neural Networks for Multimodal Wearable Activity Recognition." Sensors 16, no. 1 (2016): 115.
- Paszke, Adam, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen et al. "Pytorch: An Imperative Style, High-Performance Deep Learning Library." Advances in neural information processing systems 32 (2019).) 8026–8037.
- Plizzari, Chiara, Marco Cannici, and Matteo Matteucci. "Skeleton-Based Action Recognition via Spatial and Temporal Transformer Networks." Computer Vision and Image Understanding 208 (2021): 103219.
- Qiuming, Liu, Chen Longping, Wang Da, Xiao He, Zhou Yang, and Wu Dong. "Decoupled 2S-AGCN Human Behavior Recognition Based on New Partition Strategy." In International Conference on Mobile Networks and Management, Cham: Springer Nature Switzerland, 2023, 70-82.
- Quan, Zhenzhen, Qingshan Chen, Wei Wang, Moyan Zhang, Xiang Li, Yujun Li, and Zhi Liu. "SMTDKD: A Semantic-Aware Multimodal Transformer Fusion Decoupled Knowledge Distillation Method for Action Recognition." IEEE Sensors Journal 24, no. 2 (2023): 2289-2304.
- Qureshi, Tayyab Saeed, Muhammad Haris Shahid, Asma Ahmad Farhan, and Sultan Alamri. "A Systematic Literature Review on Human Activity Recognition Using Smart Devices: Advances, Challenges, And Future Directions." Artificial Intelligence Review 58, no. 9 (2025): 276.
- Ray, Abhisek, and Mahesh Kolekar. "Skeleton-Based Action Recognition Using Graph Convolution and Cross-Domain Transfer Learning." In 2024 National Conference on Communications (NCC), IEEE, 2024, 01-06.
- Reiss, A. (2012). PAMAP2 Physical Activity Monitoring. UCI Machine Learning Repository, 10, C5NW2H. https://doi.org/10.24432/C5NW2H
- Shahverdi, Hossein, and Seyed Ghorashi. "Lightweight Transformer for Robust Human Activity Recognition Using Smartphone IMU Data." In 14th International Conference on Human Interaction and Emerging Technologies: Artificial Intelligence & Future Applications, IHIET-FS 2025, June 10-12, 2025, University of East London, London, United Kingdom., vol. 196, AHFE International, 2025, 238-248.
- Shi, Lei, Yifan Zhang, Jian Cheng, and Hanqing Lu. "Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, 12026-12035.
- Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting." The journal of machine learning research 15, no. 1 (2014): 1929-1958.
- Subramanian, Shreyas, Bala Krishnamoorthy, and Pranav Murthy. "Dynamic Learning Rate Scheduling based on Loss Changes Leads to Faster Convergence." arXiv preprint arXiv:2512.14527 (2025).
- Tang, Yin, Qi Teng, Lei Zhang, Fuhong Min, and Jun He. "Layer-Wise Training Convolutional Neural Networks with Smaller Filters for Human Activity Recognition Using Wearable Sensors." IEEE Sensors Journal 21, no. 1 (2020): 581-592.
- Wang, Guanbo, Jiapeng Guo, Jiazhong Zhang, Xiangting Qi, and Hang Song. "Design of Human Action Recognition Method Based on Cross Attention and 2s-AGCN Model." In 2024 IEEE 6th International Conference on Civil Aviation Safety and Information Technology (ICCASIT), IEEE, 2024, 1341-1345.
- Wang, Jindong, Vincent W. Zheng, Yiqiang Chen, and Meiyu Huang. "Deep Transfer Learning for Cross-Domain Activity Recognition." In proceedings of the 3rd International Conference on Crowd Science and Engineering, 2018, 1-8.
- Wang, Xiaojuan, Tianqi Lv, Ziliang Gan, Mingshu He, and Lei Jin. "Fusion of Skeleton and Inertial Data for Human Action Recognition Based on Skeleton Motion Maps and Dilated Convolution." IEEE Sensors Journal 21, no. 21 (2021): 24653-24664.
- Wei, Jinfeng, Yunxin Wang, Mengli Guo, Pei Lv, Xiaoshan Yang, and Mingliang Xu. "Dynamic Hypergraph Convolutional Networks for Skeleton-Based Action Recognition." arXiv preprint arXiv:2112.10570 (2021).
- Xu, Cheng, Duo Chai, Jie He, Xiaotong Zhang, and Shihong Duan. "InnoHAR: A Deep Neural Network for Complex Human Activity Recognition." Ieee Access 7 (2019): 9893-9902.
- Yan, Sijie, Yuanjun Xiong, and Dahua Lin. "Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition." In Proceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1. 2018.
- Yang, Jing, Tianzheng Liao, Jingjing Zhao, Yan Yan, Yichun Huang, Zhijia Zhao, Jing Xiong, and Changhong Liu. "Domain Adaptation for Sensor-Based Human Activity Recognition with a Graph Convolutional Network." Mathematics 12, no. 4 (2024): 556.
- Yang, Kyoung Ok, Junho Koh, and Jun Won Choi. "Unified Contrastive Fusion Transformer for Multimodal Human Action Recognition." arXiv preprint arXiv:2309.05032 (2023).
- Yao, Shuochao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek Abdelzaher. "Deepsense: A Unified Deep Learning Framework for Time-Series Mobile Sensing Data Processing." In Proceedings of the 26th international conference on world wide web, 2017, 351-360.
- Zhang, Yumin, and Yanyong Wang. "A Comprehensive Survey on RGB-D-Based Human Action Recognition: Algorithms, Datasets, and Popular Applications." EURASIP Journal on Image and Video Processing 2025, no. 1 (2025): 15.
- Zheng, Ce, Sijie Zhu, Matias Mendieta, Taojiannan Yang, Chen Chen, and Zhengming Ding. "3d Human Pose Estimation with Spatial and Temporal Transformers." In Proceedings of the IEEE/CVF international conference on computer vision, 2021, 11656-11665.
- Zhu, Guanzhou, Dong Zhao, Chunliang Li, Mingyue Zhao, Zhengyuan Zhang, Hefeng Quan, and Huadong Ma. "MASTER: A Multi-Modal Foundation Model for Human Activity Recognition." Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 9, no. 3 (2025): 1-26.
- Zhu, Yida, Haiyong Luo, Runze Chen, and Fang Zhao. "DiamondNet: A Neural-Network-Based Heterogeneous Sensor Attentive Fusion for Human Activity Recognition." IEEE Transactions on Neural Networks and Learning Systems 35, no. 11 (2023): 15321-15331.

Journal of Trends in Computer Science and Smart Technology