Joint weight adversarial attack based on human skeleton action recognition

Haopeng Mu; Ling Guo; Xiaozhou Zhang

doi:10.59400/cai2342

Joint weight adversarial attack based on human skeleton action recognition

Haopeng Mu
School of Information Science and Technology, Northwest University, Xi’an 710500, China
Ling Guo
School of Information Science and Technology, Northwest University, Xi’an 710500, China
Xiaozhou Zhang
School of Information Science and Technology, Northwest University, Xi’an 710500, China

Article ID: 2342

DOI: https://doi.org/10.59400/cai2342

Keywords: human action recognition, skeleton sequence, black-box attack, adversarial machine learning

Abstract

In recent years, due to the excellent spatial information correlation, small data volume and high computational efficiency of bone data, it has been widely applied in action recognition fields such as autonomous driving and intelligent security. However, in practical applications, attackers only need to apply a small perturbation to the input bone data to cause the attacked model to make incorrect recognition of the corresponding action, thereby resulting in a significant drop in recognition accuracy and even potentially causing serious consequences in high-risk scenarios such as autonomous driving. To solve this problem, many attack methods have been proposed, such as attacks that limit the angle changes between bones or attacks that alter the length of bones. These methods can, to a certain extent, increase the attack success rate of action recognition models, but most of these methods attack the bone data by simply disregarding the influence of each joint bone node on the overall action. In this paper, we propose a new adversarial attack method, that is, to attack through interfering with the coordinate data of the entire skeletal joint nodes. In our method, the concept of joint weights is proposed, and a time cropping translation attack is designed based on joint weights to improve the attack success rate. We conducted experiments on our method. The experimental results show that our attack success rate is stable at over 60%.

Published

2025-07-04

How to Cite

Mu, H., Guo, L., & Zhang, X. (2025). Joint weight adversarial attack based on human skeleton action recognition. Computing and Artificial Intelligence, 3(3). https://doi.org/10.59400/cai2342

Download Citation

Issue

Vol. 3 No. 3 (2025)

Section

Article

This work is licensed under a Creative Commons Attribution 4.0 International License.

References

[1]Chen G, Chenb S, Fan L, et al. Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems. In: Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP); 24–27 May 2021; San Francisco, CA, USA. pp. 694–711. doi: 10.1109/SP40001.2021.00004

[2]Chen Z, Xie L, Pang S, et al. Appending Adversarial Frames for Universal Video Attack. In: Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV); 3 January 2021; Waikoloa, HI, USA. pp. 3198–3207. doi: 10.1109/WACV48630.2021.00324

[3]Tanaka N, Kera H, Kawamoto K. Adversarial Bone Length Attack on Action Recognition. Proceedings of the AAAI Conference on Artificial Intelligence. 2022; 36(2): 2335–2343. doi: 10.1609/aaai.v36i2.20132

[4]Kong J, Deng H, Jiang M. Symmetrical Enhanced Fusion Network for Skeleton-Based Action Recognition. IEEE Transactions on Circuits and Systems for Video Technology. 2021; 31(11): 4394–4408. doi: 10.1109/TCSVT.2021.3050807

[5]Goodfellow IJ, Shlens J, Szegedy C. Explaining and Harnessing Adversarial Examples. arXiv preprint. 2014. doi: 10.48550/ARXIV.1412.6572

[6]Gowal S, Qin C, Uesato J, et al. Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples. arXiv preprint. 2020. doi: 10.48550/ARXIV.2010.03593

[7]Cheng K, Zhang Y, He X, et al. Extremely Lightweight Skeleton-Based Action Recognition With ShiftGCN++. IEEE Transactions on Image Processing. 2021; 30: 7333–7348. doi: 10.1109/TIP.2021.3104182

[8]Diao Y, Shao T, Yang Y-L, et al. BASAR:Black-box Attack on Skeletal Action Recognition. In: Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 20 June 2021; Nashville, TN, USA. pp. 7593–7603. doi: 10.1109/CVPR46437.2021.00751

[9]Liu J, Akhtar N, Mian A. Adversarial Attack on Skeleton-Based Human Action Recognition. IEEE Transactions on Neural Networks and Learning Systems. 2022; 33(4): 1609–1622. doi: 10.1109/TNNLS.2020.3043002

[10]Pony R, Naeh I, Mannor S. Over-the-Air Adversarial Flickering Attacks against Video Recognition Networks. arXiv preprint. 2020. doi: 10.48550/ARXIV.2002.05123

[11]Zhang X, Xu C, Tao D. Context Aware Graph Convolution for Skeleton-Based Action Recognition. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 29 June 2020; Seattle, WA, USA. pp. 14321–14330. doi: 10.1109/CVPR42600.2020.01434

[12]Li M, Chen S, Chen X, et al. Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 15–20 June 2019; Long Beach, CA, USA. pp. 3590–3598. doi: 10.1109/CVPR.2019.00371

[13]Shi L, Zhang Y, Cheng J, et al. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 15–20 June 2019; Long Beach, CA, USA. pp. 12018–12027. doi: 10.1109/CVPR.2019.01230

[14]Hu J, Shen L, Sun G. Squeeze-and-Excitation Networks. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 18 June 2018; Salt Lake City, UT, USA. pp. 7132–7141. doi: 10.1109/CVPR.2018.00745

[15]Wu H, Liu J, Zha Z-J, et al. Mutually Reinforced Spatio-Temporal Convolutional Tube for Human Action Recognition. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence; 10 August 2019; Macao, China. pp. 968–974. doi: 10.24963/ijcai.2019/136

[16]Cho S, Maqbool MH, Liu F, et al. Self-Attention Network for Skeleton-based Human Action Recognition. In: Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV); 1 March 2020; Snowmass Village, CO, USA. pp. 624–633. doi: 10.1109/WACV45572.2020.9093639

[17]Fursov I, Zaytsev A, Burnyshev P, et al. A Differentiable Language Model Adversarial Attack on Text Classifiers. IEEE Access. 2022; 10: 17966–17976. doi: 10.1109/ACCESS.2022.3148413

[18]Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks. Communications of the ACM. 2020; 63(11): 139–144. doi: 10.1145/3422622

[19]Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. arXiv preprint. 2014. doi: 10.48550/ARXIV.1412.6980

[20]Du Y, Wang W, Wang L. Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 17 June 2015; Boston, MA, USA. pp. 1110–1118. doi: 10.1109/CVPR.2015.7298714

[21]Liu J, Shahroudy A, Xu D, et al. Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition. In: Leibe B, Matas J, Sebe N, et al. (editors). Computer Vision–ECCV 2016, Lecture Notes in Computer Science. Springer International Publishing; 2016. pp. 816–833. doi: 10.1007/978-3-319-46487-9_50

[22]Song S, Lan C, Xing J, et al. An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data. Proceedings of the AAAI Conference on Artificial Intelligence. 2017; 31(1). doi: 10.1609/aaai.v31i1.11212

[23]Zhang P, Lan C, Xing J, et al. View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2019; 41(8): 1963–1978. doi: 10.1109/TPAMI.2019.2896631

[24]Ke Q, Bennamoun M, An S, et al. A New Representation of Skeleton Sequences for 3D Action Recognition. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 7 July 2017; Honolulu, HI, USA. pp. 4570–4579. doi: 10.1109/CVPR.2017.486

[25]Liu M, Liu H, Chen C. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition. 2017; 68: 346–362. doi: 10.1016/j.patcog.2017.02.030

[26]Kim TS, Reiter A. Interpretable 3D Human Action Analysis with Temporal Convolutional Networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); 21 July 2017; Honolulu, HI, USA. pp. 1623–1631. doi: 10.1109/CVPRW.2017.207

[27]Cheng K, Zhang Y, He X, et al. Skeleton-Based Action Recognition With Shift Graph Convolutional Network. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 13–19 June 2020; Seattle, WA, USA. pp. 180–189. doi: 10.1109/CVPR42600.2020.00026

[28]Liu Z, Zhang H, Chen Z, et al. Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 13–19 June 2020; Seattle, WA, USA. pp. 140–149. doi: 10.1109/CVPR42600.2020.00022

[29]Shi L, Zhang Y, Cheng J, et al. Skeleton-Based Action Recognition With Directed Graph Neural Networks. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 15 June 2019; Long Beach, CA, USA. pp. 7904–7913. doi: 10.1109/CVPR.2019.00810

[30]Zhang P, Lan C, Zeng W, et al. Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 13–19 June 2020; Seattle, WA, USA. pp. 1109–1118. doi: 10.1109/CVPR42600.2020.00119

[31]Yan S, Xiong Y, Lin D. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. Proceedings of the AAAI Conference on Artificial Intelligence. 2018; 32(1). doi: 10.1609/aaai.v32i1.12328

[32]Carlini N, Wagner D. Towards Evaluating the Robustness of Neural Networks. In: Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP); 28 May 2017; San Jose, CA, USA. pp. 39–57. doi: 10.1109/SP.2017.49

[33]Krizhevsky A. Learning Multiple Layers of Features from Tiny Images. University of Toronto; 2009.

[34]Zheng T, Liu S, Chen C, et al. Towards Understanding the Adversarial Vulnerability of Skeleton-based Action Recognition. arXiv preprint. 2020. doi: 10.48550/ARXIV.2005.07151

[35]Shahroudy A, Liu J, Ng T-T, et al. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 26 June 2016; Las Vegas, NV, USA. pp. 1010–1019. doi: 10.1109/CVPR.2016.115

Editor-in-Chief

Prof. Shaohua Wan
University of Electronic Science and Technology of China, China

eISSN

3029-2786

Publication Frequency

Quarterly (since 2025)

About the Publisher

Academic Publishing insists on taking academic exchange and publication as the main line, carrying out comprehensive management based on science and technology, and fully exploring excellent international publishing resources. Within 5 years, it will form a strategic framework and scale with science (S), technology (T), medicine (M), education (E), and humanities and arts (H) as the main publishing fields. Academic Publishing is headquartered in Singapore and based in Malaysia, with the United States and China providing the main scientific and academic resources. At the same time, it has established long-term good cooperative relations with other publishing companies, scientific research communities, and academic organizations in more than a dozen countries and regions. Academic Publishing uses English and Chinese as its main publishing languages, mainly publishing books, journals, and conference papers in print and online. The vast majority of publications follow the international open access policy, providing stable and long-term quality and professional publications. With the joint efforts of the expert team and our professional editorial team, our publications will gradually be indexed by international databases in stages to provide convenient and professional retrieval for various scholars. At the same time, manuscripts we accept will be subject to the peer review principle, and cutting-edge and innovative research articles will be preferentially accepted for peer reference and discussion. All kinds of our publications are welcome for peer to contribute, access, and download.

more

Volume Arrangement

2025

2024

2023

Featured Articles

Identifying voices using convolution neural network models AlexNet and ResNet

Deep learning (DL) techniques which implement deep neural networks became popular due to the increase of high-performance computing facilities. DL achieves higher power and flexibility due to its ability to process many features when it deals with unstructured data. DL algorithm passes the data through several layers; each layer is capable of extracting features progressively and passes it to the next layer. Initial layers extract low-level features, and succeeding layers combine features to form a complete representation. This research attempts to utilize DL techniques for identifying sounds. The development in DL models has extensively covered classification and verification of objects through images. However, there have not been any notable findings concerning identification and verification of the voice of an individual from different other individuals using DL techniques. Hence, the proposed research aims to develop DL techniques capable of isolating the voice of an individual from a group of other sounds and classify them based on the use of convolutional neural networks models AlexNet and ResNet, that are used in voice identification. We achieved the classification accuracy of ResNet and AlexNet model for the problem of voice identification is 97.2039 % and 65.95% respectively, in which ResNet model achieves the best result.

Revolutionizing Neurosurgery and Neurology: The transformative impact of artificial intelligence in healthcare

The integration of artificial intelligence (AI) has brought about a paradigm shift in the landscape of Neurosurgery and Neurology, revolutionizing various facets of healthcare. This article meticulously explores seven pivotal dimensions where AI has made a substantial impact, reshaping the contours of patient care, diagnostics, and treatment modalities. AI’s exceptional precision in deciphering intricate medical imaging data expedites accurate diagnoses of neurological conditions. Harnessing patient-specific data and genetic information, AI facilitates the formulation of highly personalized treatment plans, promising more efficacious therapeutic interventions. The deployment of AI-powered robotic systems in neurosurgical procedures not only ensures surgical precision but also introduces remote capabilities, mitigating the potential for human error. Machine learning models, a core component of AI, play a crucial role in predicting disease progression, optimizing resource allocation, and elevating the overall quality of patient care. Wearable devices integrated with AI provide continuous monitoring of neurological parameters, empowering early intervention strategies for chronic conditions. AI’s prowess extends to drug discovery by scrutinizing extensive datasets, offering the prospect of groundbreaking therapies for neurological disorders. The realm of patient engagement witnesses a transformative impact through AI-driven chatbots and virtual assistants, fostering increased adherence to treatment plans. Looking ahead, the horizon of AI in Neurosurgery and Neurology holds promises of heightened personalization, augmented decision-making, early intervention, and the emergence of innovative treatment modalities. This narrative is one of optimism and collaboration, depicting a synergistic partnership between AI and healthcare professionals to propel the field forward and significantly enhance the lives of individuals grappling with neurological challenges. This article provides an encompassing view of AI’s transformative influence in Neurosurgery and Neurology, highlighting its potential to redefine the landscape of patient care and outcomes.

Enhancing user experience in large language models through human-centered design: Integrating theoretical insights with an experimental study to meet diverse software learning needs with a single document knowledge base

This paper begins with a theoretical exploration of the rise of large language models (LLMs) in Human-Computer Interaction (HCI), their impact on user experience (HX) and related challenges. It then discusses the benefits of Human-Centered Design (HCD) principles and the possibility of their application within LLMs, subsequently deriving six specific HCD guidelines for LLMs. Following this, a preliminary experiment is presented as an example to demonstrate how HCD principles can be employed to enhance user experience within GPT by using a single document input to GPT’s Knowledge base as new knowledge resource to control the interactions between GPT and users, aiming to meet the diverse needs of hypothetical software learners as much as possible. The experimental results demonstrate the effect of different elements’ forms and organizational methods in the document, as well as GPT’s relevant configurations, on the interaction effectiveness between GPT and software learners. A series of trials are conducted to explore better methods to realize text and image displaying, and jump action. Two template documents are compared in the aspects of the performances of the four interaction modes. Through continuous optimization, an improved version of the document was obtained to serve as a template for future use and research.

Clustering data analytics of urban land use for change detection

In this study, the author proposes and details a workflow for the spatial-temporal demarcation of urban areal features in 8 cities of Tamilnadu, India. During the inception phase, functional requirements and non-functional parameters are analyzed and designed, within a suitable pixel area and object-oriented derived paradigm. Land use categories are defined from OpenStreetMap (OSM) related works with the scope of conducting climate change, using multispectral sensors onboard Landsat series. Furthermore, we augment the bands dataset with Spatially Invariant Feature Transform (SIFT), Normalized Difference Vegetation Index (NDVI), Normalized Difference Built-Up Index (NDBI), Leaf Area Index (LAI), and Texture base indices, as a means of spatially integrating auto-covariance to stationarity patterns. In doing so, change detection can be pursuit by scaling up the segmentation of regional/zonal boundaries in a multi-dimensional environment, with the aid of Wide Area Networks (WAN) cluster computers such as the BEOWULF/Google Earth Engine clusters. GeoAnalytical measures are analyzed in the design of local and zonal spatial models (GRID, RASTER, DEM, IMAGE COLLECTION). Finally, multi variate geostatistical works are ensued for precision and recall in predictive data analytics. The author proposes reusing machine learning tools (filtering by attribute-based indexing in PaaS clouds) for pattern recognition and visualization of features and feature collection.

Application of computer vision in livestock and crop production—A review

Nowadays, it is a challenge for farmers to produce healthier food for the world population and save land resources. Recently, the integration of computer vision technology in field and crop production ushered in a new era of innovation and efficiency. Computer vision, a subfield of artificial intelligence, leverages image and video analysis to extract meaningful information from visual data. In agriculture, this technology is being utilized for tasks ranging from disease detection and yield prediction to animal health monitoring and quality control. By employing various imaging techniques, such as drones, satellites, and specialized cameras, computer vision systems are able to assess the health and growth of crops and livestock with unprecedented accuracy. The review is divided into two parts: Livestock and Crop Production giving the overview of the application of computer vision applications within agriculture, highlighting its role in optimizing farming practices and enhancing agricultural productivity.