AS-FCRNet: a lightweight multi-frame acoustic–seismic fusion network for high-precision ground moving target recognition on UGS
Abstract
To address the challenges of high computational complexity and temporal modeling difficulties caused by high-dimensional data in acoustic and seismic signal classification, this paper proposes a multi-stage dimensionality reduction and classification framework based on the integration of Mel-frequency spectrum feature extraction, Convolutional Neural Networks (CNNs), and Long Short-Term Memory (LSTM) networks. The method significantly reduces computational complexity while maintaining competitive classification accuracy through progressive feature compression and acoustic-seismic feature fusion. Specifically, Mel-frequency spectrum feature extraction is first performed on dual-channel input signals (acoustic and seismic) to extract perceptually relevant physical features aligned with human auditory characteristics. Then, a lightweight CNN is designed to perform further feature extraction on log-Mel energy representations; in the fusion stage, we investigate three fusion strategies (information-level, feature-level, and decision-level fusion) for acoustic and seismic signals to identify the optimal approach, before fusing the information and compressing the fused features into a short vector for subsequent temporal modeling. A sequence of compact feature vectors extracted from consecutive frames (e.g., four-frame segments) is fed into an LSTM network to capture temporal dependencies, and the final classification is performed based on the output of the last time step. Experimental results demonstrate that the proposed approach effectively balances inference efficiency and model performance, achieving accurate and reliable classification results with low computational complexity.
Copyright (c) 2025 Zheyu Liu, kunsheng Xing, Wei Wang, Nan Wang

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
[1]William PE, Hoffman MW. Classification of military ground vehicles using time domain harmonics’ amplitudes. IEEE Transactions on Instrumentation and Measurement. 2011; 60(11): 3720–3731. doi: 10.1109/TIM.2011.2135110
[2]Prado G, Johnson R. Changing requirements and solutions for unattended ground sensors. In: Carapezza EM (editor). Unmanned/Unattended Sensors and Sensor Networks IV, Proceedings of the Optics/Photonics in Security and Defence; 5 October 2007; Florence, Italy. SPIE. 2007. p. 67360X. doi: 10.1117/12.748638
[3]Tian Y, Qi H, Wang X. Target detection and classification using seismic signal processing in unattended ground sensor systems. In: Proceedings of the 2002 International Conference on Acoustics Speech and Signal Processing; 13–17 May 2002; Orlando, FL, USA. p. IV-4172-IV–4172. doi: 10.1109/ICASSP.2002.5745620
[4]Bin K, Jiang Y, Fu R, et al. Multimodal attention transformer encoder for acoustic-seismic fusion target recognition. arXiv preprint. 2025. doi: 10.2139/ssrn.5254396
[5]Weisser A, Miles K, Richardson MJ, et al. Conversational distance adaptation in noise and its effect on signal-to-noise ratio in realistic listening environments. The Journal of the Acoustical Society of America. 2021; 149(4): 2896–2907. doi: 10.1121/10.0004774
[6]Ekpezu AO, Wiafe I, Katsriku F, et al. Using deep learning for acoustic event classification: The case of natural disasters. The Journal of the Acoustical Society of America. 2021; 149(4): 2926–2935. doi: 10.1121/10.0004771
[7]Yuan Y, Shen Q, Xi W, et al. Multidisciplinary design optimization of dynamic positioning system for semi-submersible platform. Ocean Engineering. 2023; 285: 115426. doi: 10.1016/j.oceaneng.2023.115426
[8]Yuan Y, Yang Q, Ren J, et al. Short-term power load forecasting based on SKDR hybrid model. Electrical Engineering. 2025; 107(5): 5769–5785. doi: 10.1007/s00202-024-02821-x
[9]George J, Mary L, Riyas K. Vehicle detection and classification from acoustic signal using ANN and KNN. In: Proceedings of the 2013 International Conference on Control Communication and Computing (ICCC); 13–15 December 2013; Thiruvananthapuram, India. pp. 436–439. doi: 10.1109/ICCC.2013.6731694
[10]Jin X, Sarkar S, Ray A, et al. Target detection and classification using seismic and PIR sensors. IEEE Sensors Journal. 2012; 12(6): 1709–1718. doi: 10.1109/JSEN.2011.2177257
[11]Ozkaya SG, Baygin M, Dogan S, et al. Machine learning-based equipment sound classification for advanced construction management and site supervision. World Journal of Advanced Research and Reviews. 2025; 26(3): 317–329. doi: 10.30574/wjarr.2025.26.3.2178
[12]Bin K, Lin J, Tong X, et al. Moving target recognition with seismic sensing: A review. Measurement. 2021; 181: 109584. doi: 10.1016/j.measurement.2021.109584
[13]Dibazar AA, Yousefi A, Park HO, et al. Intelligent acoustic and vibration recognition/alert systems for security breaching detection, close proximity danger identification, and perimeter protection. In: Proceedings of the 2010 IEEE International Conference on Technologies for HomelandSecurity (HST); 8–11 November 2010; Waltham, MA, USA. pp. 351–356. doi: 10.1109/THS.2010.5654931
[14]Cunningham P, Delany SJ. K-Nearest neighbour classifiers—a tutorial. ACM Computing Surveys. 2022; 54(6): 1–25. doi: 10.1145/3459665
[15]Kalra M, Kumar S, Das B. Analysis of instantaneous amplitude and frequency of EWT modes for automatic target classification. In: Proceedings of the 2021 IEEE Bombay Section Signature Conference (IBSSC); 18 November 2021; Gwalior, India. pp. 1–6. doi: 10.1109/IBSSC53889.2021.9673272
[16]Narayanaswami R, Gandhe A, Tyurina A, et al. Sensor fusion and feature-based human/animal classification for unattended ground sensors. In: 2010 IEEE International Conference on Technologies for Homeland Security (HST); 8–10 November 2010; Waltham, MA, USA. pp. 344–350. doi: 10.1109/THS.2010.5655025
[17]Cyriac S, Harsha BM, Woon Kim Y. Seismic activity-based human intrusion detection using deep neural networks. In: 2022 13th International Conference on Information and Communication Technology Convergence (ICTC); 19 October 2022; Jeju Island, Republic of Korea. pp. 130–135. doi: 10.1109/ICTC55196.2022.9952913
[18]Damarla T, Mehmood A, Sabatier J. Detection of people and animals using non-imaging sensors. In: Proceedings of the 14th International Conference on Information Fusion; 5 July 2011; Chicago, IL, USA. pp. 1–8. Available online: https://ieeexplore.ieee.org/abstract/document/5977674
[19]Park HO, Dibazar AA, Berger TW. Cadence analysis of temporal gait patterns for seismic discrimination between human and quadruped footsteps. In: Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing; 19–24 April 2009; Taipei, Taiwan. pp. 1749–1752. doi: 10.1109/ICASSP.2009.4959942
[20]Wang Y, Cheng X, Zhou P, et al. Convolutional neural network-based moving ground target classification using raw seismic waveforms as input. IEEE Sensors Journal. 2019; 19(14): 5751–5759. doi: 10.1109/JSEN.2019.2907051
[21]Jin G, Ye B, Wu Y, et al. Vehicle classification based on seismic signatures using convolutional neural network. IEEE Geoscience and Remote Sensing Letters. 2019; 16(4): 628–632. doi: 10.1109/LGRS.2018.2879687
[22]Tran VT, Tsai WH. Acoustic-based emergency vehicle detection using convolutional neural networks. IEEE Access. 2020; 8: 75702–75713. doi: 10.1109/ACCESS.2020.2988986
[23]Yu Y, Rashidi M, Samali B, et al. Crack detection of concrete structures using deep convolutional neural networks optimized by enhanced chicken swarm algorithm. Structural Health Monitoring. 2022; 21(5): 2244–2263. doi: 10.1177/14759217211053546
[24]Zhao X, Wang L, Zhang Y, et al. A review of convolutional neural networks in computer vision. Artificial Intelligence Review. 2024; 57(4): 99. doi: 10.1007/s10462-024-10721-6
[25]Li J, Han L, Li X, et al. An evaluation of deep neural network models for music classification using spectrograms. Multimedia Tools and Applications. 2022; 81(4): 4621–4647. doi: 10.1007/s11042-020-10465-9
[26]Wang Z, Ma Y, Gao J, et al. Remaining useful life prediction for solid-state lithium batteries based on spatial–temporal relations and neuronal ODE-assisted KAN. Reliability Engineering & System Safety. 2025; 260: 111003. doi: 10.1016/j.ress.2025.111003
[27]Yuan Y, Yang Q, Ren J, et al. Short-term wind power prediction based on IBOA-AdaBoost-RVM. Journal of King Saud University - Science. 2024; 36(11): 103550. doi: 10.1016/j.jksus.2024.103550
[28]Yuan Y, Yang Q, Wang G, et al. Combined improved tuna swarm optimization with graph convolutional neural network for remaining useful life of engine. Quality and Reliability Engineering International. 2025; 41(1): 174–91. doi: 10.1002/qre.3651
[29]Xing K, Wang N, Wang W, et al. CNN-based multiterrain moving target recognition model for unattended ground sensor systems. Journal of Sensors. 2022; 2022: 1–10. doi: 10.1155/2022/7542114
[30]Bin K, Lin J, Tong X. Edge intelligence-based moving target classification using compressed seismic measurements and convolutional neural networks. IEEE Geoscience and Remote Sensing Letters. 2022; 19: 1–5. doi: 10.1109/LGRS.2021.3055795
[31]Akter R, Islam MdR, Debnath SK, et al. A hybrid CNN-LSTM model for environmental sound classification: Leveraging feature engineering and transfer learning. Digital Signal Processing. 2025; 163: 105234. doi: 10.1016/j.dsp.2025.105234
[32]Mohine S, Bansod BS, Bhalla R, et al. Acoustic modality based hybrid deep 1D CNN-BiLSTM algorithm for moving vehicle classification. IEEE Transactions on Intelligent Transportation Systems. 2022; 23(9): 16206–16216. doi: 10.1109/TITS.2022.3148783
[33]Nie T, Wang S, Wang Y, et al. An effective recognition of moving target seismic anomaly for security region based on deep bidirectional LSTM combined CNN. Multimedia Tools and Applications. 2023; 83(22): 61645–61658. doi: 10.1007/s11042-023-14382-5
[34]Sun L, Zhang Z, Tang H, et al. Vehicle acoustic and seismic synchronization signal classification using long-term features. IEEE Sensors Journal. 2023; 23(10): 10871–10878. doi: 10.1109/JSEN.2023.3263572
[35]Abdul ZKh, Al-Talabani AK. Mel frequency cepstral coefficient and its applications: A review. IEEE Access. 2022; 10: 122136–122158. doi: 10.1109/ACCESS.2022.3223444
[36]Chollet F. Xception: deep learning with depthwise separable convolutions. arXiv preprint. 2016. doi: 10.48550/ARXIV.1610.02357
[37]Dong S, Chen Z. A multi-level feature fusion network for remote sensing image segmentation. Sensors. 2021; 21(4): 1267. doi: 10.3390/s21041267
[38]Oh SI, Kang HB. Object detection and classification by decision-level fusion for intelligent vehicle systems. Sensors. 2017; 17(1): 207. doi: 10.3390/s17010207
[39]Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation. 1997; 9(8): 1735–1780. doi: 10.1162/neco.1997.9.8.1735
[40]Liu S, Jiang W, Wu L, et al. Real-time classification of rubber wood boards using an SSR-based CNN. IEEE Transactions on Instrumentation and Measurement. 2020; 69(11): 8725–8734. doi: 10.1109/TIM.2020.3001370




