Joint weight adversarial attack based on human skeleton action recognition
Abstract
In recent years, due to the excellent spatial information correlation, small data volume and high computational efficiency of bone data, it has been widely applied in action recognition fields such as autonomous driving and intelligent security. However, in practical applications, attackers only need to apply a small perturbation to the input bone data to cause the attacked model to make incorrect recognition of the corresponding action, thereby resulting in a significant drop in recognition accuracy and even potentially causing serious consequences in high-risk scenarios such as autonomous driving. To solve this problem, many attack methods have been proposed, such as attacks that limit the angle changes between bones or attacks that alter the length of bones. These methods can, to a certain extent, increase the attack success rate of action recognition models, but most of these methods attack the bone data by simply disregarding the influence of each joint bone node on the overall action. In this paper, we propose a new adversarial attack method, that is, to attack through interfering with the coordinate data of the entire skeletal joint nodes. In our method, the concept of joint weights is proposed, and a time cropping translation attack is designed based on joint weights to improve the attack success rate. We conducted experiments on our method. The experimental results show that our attack success rate is stable at over 60%.
Copyright (c) 2025 Haopeng Mu, Ling Guo, Xiaozhou Zhang

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
[1]Chen G, Chenb S, Fan L, et al. Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems. In: Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP); 24–27 May 2021; San Francisco, CA, USA. pp. 694–711. doi: 10.1109/SP40001.2021.00004
[2]Chen Z, Xie L, Pang S, et al. Appending Adversarial Frames for Universal Video Attack. In: Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV); 3 January 2021; Waikoloa, HI, USA. pp. 3198–3207. doi: 10.1109/WACV48630.2021.00324
[3]Tanaka N, Kera H, Kawamoto K. Adversarial Bone Length Attack on Action Recognition. Proceedings of the AAAI Conference on Artificial Intelligence. 2022; 36(2): 2335–2343. doi: 10.1609/aaai.v36i2.20132
[4]Kong J, Deng H, Jiang M. Symmetrical Enhanced Fusion Network for Skeleton-Based Action Recognition. IEEE Transactions on Circuits and Systems for Video Technology. 2021; 31(11): 4394–4408. doi: 10.1109/TCSVT.2021.3050807
[5]Goodfellow IJ, Shlens J, Szegedy C. Explaining and Harnessing Adversarial Examples. arXiv preprint. 2014. doi: 10.48550/ARXIV.1412.6572
[6]Gowal S, Qin C, Uesato J, et al. Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples. arXiv preprint. 2020. doi: 10.48550/ARXIV.2010.03593
[7]Cheng K, Zhang Y, He X, et al. Extremely Lightweight Skeleton-Based Action Recognition With ShiftGCN++. IEEE Transactions on Image Processing. 2021; 30: 7333–7348. doi: 10.1109/TIP.2021.3104182
[8]Diao Y, Shao T, Yang Y-L, et al. BASAR:Black-box Attack on Skeletal Action Recognition. In: Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 20 June 2021; Nashville, TN, USA. pp. 7593–7603. doi: 10.1109/CVPR46437.2021.00751
[9]Liu J, Akhtar N, Mian A. Adversarial Attack on Skeleton-Based Human Action Recognition. IEEE Transactions on Neural Networks and Learning Systems. 2022; 33(4): 1609–1622. doi: 10.1109/TNNLS.2020.3043002
[10]Pony R, Naeh I, Mannor S. Over-the-Air Adversarial Flickering Attacks against Video Recognition Networks. arXiv preprint. 2020. doi: 10.48550/ARXIV.2002.05123
[11]Zhang X, Xu C, Tao D. Context Aware Graph Convolution for Skeleton-Based Action Recognition. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 29 June 2020; Seattle, WA, USA. pp. 14321–14330. doi: 10.1109/CVPR42600.2020.01434
[12]Li M, Chen S, Chen X, et al. Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 15–20 June 2019; Long Beach, CA, USA. pp. 3590–3598. doi: 10.1109/CVPR.2019.00371
[13]Shi L, Zhang Y, Cheng J, et al. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 15–20 June 2019; Long Beach, CA, USA. pp. 12018–12027. doi: 10.1109/CVPR.2019.01230
[14]Hu J, Shen L, Sun G. Squeeze-and-Excitation Networks. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 18 June 2018; Salt Lake City, UT, USA. pp. 7132–7141. doi: 10.1109/CVPR.2018.00745
[15]Wu H, Liu J, Zha Z-J, et al. Mutually Reinforced Spatio-Temporal Convolutional Tube for Human Action Recognition. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence; 10 August 2019; Macao, China. pp. 968–974. doi: 10.24963/ijcai.2019/136
[16]Cho S, Maqbool MH, Liu F, et al. Self-Attention Network for Skeleton-based Human Action Recognition. In: Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV); 1 March 2020; Snowmass Village, CO, USA. pp. 624–633. doi: 10.1109/WACV45572.2020.9093639
[17]Fursov I, Zaytsev A, Burnyshev P, et al. A Differentiable Language Model Adversarial Attack on Text Classifiers. IEEE Access. 2022; 10: 17966–17976. doi: 10.1109/ACCESS.2022.3148413
[18]Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks. Communications of the ACM. 2020; 63(11): 139–144. doi: 10.1145/3422622
[19]Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. arXiv preprint. 2014. doi: 10.48550/ARXIV.1412.6980
[20]Du Y, Wang W, Wang L. Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 17 June 2015; Boston, MA, USA. pp. 1110–1118. doi: 10.1109/CVPR.2015.7298714
[21]Liu J, Shahroudy A, Xu D, et al. Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition. In: Leibe B, Matas J, Sebe N, et al. (editors). Computer Vision–ECCV 2016, Lecture Notes in Computer Science. Springer International Publishing; 2016. pp. 816–833. doi: 10.1007/978-3-319-46487-9_50
[22]Song S, Lan C, Xing J, et al. An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data. Proceedings of the AAAI Conference on Artificial Intelligence. 2017; 31(1). doi: 10.1609/aaai.v31i1.11212
[23]Zhang P, Lan C, Xing J, et al. View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2019; 41(8): 1963–1978. doi: 10.1109/TPAMI.2019.2896631
[24]Ke Q, Bennamoun M, An S, et al. A New Representation of Skeleton Sequences for 3D Action Recognition. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 7 July 2017; Honolulu, HI, USA. pp. 4570–4579. doi: 10.1109/CVPR.2017.486
[25]Liu M, Liu H, Chen C. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition. 2017; 68: 346–362. doi: 10.1016/j.patcog.2017.02.030
[26]Kim TS, Reiter A. Interpretable 3D Human Action Analysis with Temporal Convolutional Networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); 21 July 2017; Honolulu, HI, USA. pp. 1623–1631. doi: 10.1109/CVPRW.2017.207
[27]Cheng K, Zhang Y, He X, et al. Skeleton-Based Action Recognition With Shift Graph Convolutional Network. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 13–19 June 2020; Seattle, WA, USA. pp. 180–189. doi: 10.1109/CVPR42600.2020.00026
[28]Liu Z, Zhang H, Chen Z, et al. Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 13–19 June 2020; Seattle, WA, USA. pp. 140–149. doi: 10.1109/CVPR42600.2020.00022
[29]Shi L, Zhang Y, Cheng J, et al. Skeleton-Based Action Recognition With Directed Graph Neural Networks. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 15 June 2019; Long Beach, CA, USA. pp. 7904–7913. doi: 10.1109/CVPR.2019.00810
[30]Zhang P, Lan C, Zeng W, et al. Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 13–19 June 2020; Seattle, WA, USA. pp. 1109–1118. doi: 10.1109/CVPR42600.2020.00119
[31]Yan S, Xiong Y, Lin D. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. Proceedings of the AAAI Conference on Artificial Intelligence. 2018; 32(1). doi: 10.1609/aaai.v32i1.12328
[32]Carlini N, Wagner D. Towards Evaluating the Robustness of Neural Networks. In: Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP); 28 May 2017; San Jose, CA, USA. pp. 39–57. doi: 10.1109/SP.2017.49
[33]Krizhevsky A. Learning Multiple Layers of Features from Tiny Images. University of Toronto; 2009.
[34]Zheng T, Liu S, Chen C, et al. Towards Understanding the Adversarial Vulnerability of Skeleton-based Action Recognition. arXiv preprint. 2020. doi: 10.48550/ARXIV.2005.07151
[35]Shahroudy A, Liu J, Ng T-T, et al. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 26 June 2016; Las Vegas, NV, USA. pp. 1010–1019. doi: 10.1109/CVPR.2016.115

