Automated quality control of recycled aggregates via deep learning: A unified framework for instance segmentation and mass estimation
Abstract
The large-scale use of recycled aggregates (RA) in high-grade construction applications is currently hindered by the high variability of their physical properties. Current quality control relies on manual sorting, which is labor-intensive and limits scalability. This study presents RAMSES (Recycled Aggregates Mass estimation and Segmentation), an automated framework based on deep learning, designed to bridge the gap between high-speed production and rigorous material characterization. A central contribution of this work is the introduction of a large-scale, publicly available dataset comprising 90,000 labeled and batch-weighed aggregate instances. This extensive dataset supports a strong statistical robustness across diverse RA compositions and serves as a benchmark for automated waste characterization. Using this dataset, RAMSES performs simultaneous instance segmentation and direct mass estimation from 2D images. By integrating a dual-branch architecture, the model effectively decouples morphological features from instance-dependent density factors. The framework achieves high precision in particle identification (mean Average Precision mAP@[0.5:0.95] = 0.84, mAP@0.5 = 0.91) and a 0.3% relative error in total mass prediction, which meets industrial requirements for batch monitoring. By providing a scalable alternative to manual inspection, this approach improves the consistency of RA-based concrete mixes, directly supporting the transition to a circular construction economy.
Copyright (c) 2026 Jérôme Lux, Pierre-Yves Mahieux, Philippe Turcry

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
[1]de Larrard F, Colina H (editors). Concrete Recycling: Research and Practice. CRC Press; 2019. doi: 10.1201/9781351052825
[2]Wang B, Yan L, Fu Q, et al. A Comprehensive Review on Recycled Aggregate and Recycled Aggregate Concrete. Resources, Conservation and Recycling. 2021; 171: 105565. doi: 10.1016/j.resconrec.2021.105565
[3]EN 933-11:2009. Tests for Geometrical Properties of Aggregates—Part 11: Classification Test for the Constituents of Coarse Recycled Aggregate. 2009.
[4]He K, Gkioxari G, Dollár P, et al. Mask R-CNN. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV); 22–29 October 2017; Venice, Italy. pp. 2980–2988. doi: 10.1109/ICCV.2017.322
[5]Wang X, Zhang R, Kong T, et al. SOLOv2: Dynamic and Fast Instance Segmentation. In: Larochelle H, Ranzato M, Hadsell R, et al. (editors). NIPS '20: Proceedings of the 34th International Conference on Advances in Neural Information Processing Systems, Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020); 6–12 December 2020; Vancouver, BC, Canada. Curran Associates, Inc.; 2020. pp. 17721–17732.
[6]Bolya D, Zhou C, Xiao F, et al. YOLACT++: Better Real-time Instance Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2020; 42(2): 1108–1121. doi: 10.1109/TPAMI.2020.3014297
[7]Terven J, Córdova-Esparza DM, Romero-González JA. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Machine Learning and Knowledge Extraction. 2023; 5(4): 1680–1716. doi: 10.3390/make5040083
[8]Li F, Zhang H, Xu H, et al. Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation. In: Proceedinhs of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 17–24 June 2023; Vancouver, BC, Canada. pp. 3041–3050. doi: 10.1109/CVPR52729.2023.00297
[9]Cheng B, Misra I, Schwing AG, et al. Masked-attention Mask Transformer for Universal Image Segmentation. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 18–24 June 2022; New Orleans, LA, USA. pp. 1280–1289. doi: 10.1109/CVPR52688.2022.00135
[10]Carion N, Massa F, Synnaeve G, et al. End-to-End Object Detection with Transformers. In: Vedaldi A, Bischof H, Brox T, et al. (editors). Computer Vision—ECCV 2020, Proceedings of the European Conference on Computer Vision; 23–28 August 2020; Glasgow, UK. Springer International Publishing; 2020. 12346, pp. 213–229. doi: 10.1007/978-3-030-58452-8_13
[11]Guo R, Niu D, Qu L, et al. SOTR: Segmenting Objects with Transformers. In: Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV); 10–17 October 2021; Montreal, QC, Canada. pp. 7137–7146. doi: 10.1109/ICCV48922.2021.00707
[12]Gao Y, Wang J, Xu X. Machine Learning in Construction and Demolition Waste Management: Progress, Challenges, and Future Directions. Automation in Construction. 2024; 162: 105380. doi: 10.1016/j.autcon.2024.105380
[13]Demetriou D, Mavromatidis P, Petrou MF, et al. CODD: A Benchmark Dataset for the Automated Sorting of Construction and Demolition Waste. Waste Management. 2024; 178: 35–45. doi: 10.1016/j.wasman.2024.02.017
[14]Demetriou D, Mavromatidis P, Robert PM, et al. Real-time Construction Demolition Waste Detection Using State-of-the-art Deep Learning Methods; Single-stage vs Two-stage Detectors. Waste Management. 2023; 167: 194–203. doi: 10.1016/j.wasman.2023.05.039
[15]Zhou Q, Liu H, Qiu Y, et al. Object Detection for Construction Waste Based on an Improved YOLOv5 Model. Sustainability. 2023; 15: 681. doi: 10.3390/su15010681
[16]Serranti S, Palmieri R, Bonifazi G, et al. An Automated Classification of Recycled Aggregates for the Evaluation of Product Standard Compliance. Sustainability. 2023; 15: 2009. doi: 10.3390/su152015009
[17]Hamdan M, Rover D, Darr M, et al. Mass Estimation from Images Using Deep Neural Network and Sparse Ground Truth. In: Proceedings of the 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA); 16–19 December 2019; Boca Raton, FL, USA. pp. 1987–1992. doi: 10.1109/ICMLA.2019.00318
[18]Miura Y, Sawamura Y, Shinomiya Y, et al. Vegetable Mass Estimation Based on Monocular Camera Using Convolutional Neural Network. In: Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC); 11–14 October 2020; Toronto, ON, Canada. pp. 2106–2112. doi: 10.1109/SMC42975.2020.9282930
[19]Dohmen R, Catal C, Liu Q. Image-based Body Mass Prediction of Heifers Using Deep Neural Networks. Biosystems Engineering. 2021; 204: 283–293. doi: 10.1016/j.biosystemseng.2021.02.001
[20]Standley T, Sener O, Chen D, et al. image2mass: Estimating the Mass of an Object from Its Image. In: Levine S, Vanhoucke V, Goldberg K, et al. (editors). Proceedings of the 1st Annual Conference on Robot Learning, Proceedings of the 1st Conference on Robot Learning (CoRL 2017); 13–15 November 2017; Mountain View, CA, USA. PMLR; 2017. 78, pp. 324–333.
[21]Lux J, Lau Hiu Hoong JD, Mahieux PY, et al. Classification and Estimation of the Mass Composition of Recycled Aggregates by Deep Neural Networks. Computers in Industry. 2023; 148: 103889. doi: 10.1016/j.compind.2023.103889
[22]He K, Zhang X, Ren S, et al. Identity Mappings in Deep Residual Networks. In: Leibe B, Matas J, Sebe N, et al. (editors). Computer Vision—ECCV 2016, Proceedings of the 14th European Conference, October 11–14, 2016; Amsterdam, The Netherlands. Springer International Publishing; 2016. 9905, pp. 630–645.
[23]Gu W, Bai S, Kong L. A Review on 2D Instance Segmentation Based on Deep Neural Networks. Image and Vision Computing. 2022; 120: 104401. doi: 10.1016/j.imavis.2022.104401
[24]Lee Y, Park J. CenterMask: Real-Time Anchor-Free Instance Segmentation. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 13–19 June 2022; Seattle, WA, USA. pp. 13903–13912. 10.1109/CVPR42600.2020.01392
[25]Chen H, Sun K, Tian Z, et al. BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation. arXiv preprint. 2020. doi: 10.48550/arXiv.2001.00309
[26]Tian Z, Shen C, Chen H. Conditional Convolutions for Instance Segmentation. In: Vedaldi A, Bischof H, Brox T, et al. (editors). Computer Vision—ECCV 2020, Proceedings of the European Conference on Computer Vision; 23–28 August 2020; Glasgow, UK. Springer International Publishing; 2020. pp. 282–298.
[27]Tian Z, Shen C, Chen H, et al. FCOS: Fully Convolutional One-Stage Object Detection. In: Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV); 27 October 2019–2 November 2019; Seoul, South Korea. pp. 9626–9635. doi: 10.1109/ICCV.2019.00972
[28]Lau Hiu Hoong JD, Lux J, Mahieux PY, et al. Determination of the Composition of Recycled Aggregates Using a Deep Learning-based Image Analysis. Automation in Construction. 2020; 116: 103204. doi: 10.1016/j.autcon.2020.103204
[29]Kirillov A, Mintun E, Ravi N, et al. Segment Anything. In: Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV); 1–6 October 2023; Paris, France. pp. 3992–4003. doi: 10.1109/ICCV51070.2023.00371
[30]Cvat.ai/cvat. Available online: https://github.com/cvat-ai/cvat (accessed on 11 February 2026).
[31]Woo S, Debnath S, Hu R, et al. ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders. In: Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 17–24 June 2023; Vancouver, BC, Canada. pp. 16133–16142. doi: 10.1109/CVPR52729.2023.01548
[32]Lin TY, Goyal P, Girshick R, et al. Focal Loss for Dense Object Detection. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV); 22–29 October 2017; Venice, Italy. pp. 2999–3007. doi: 10.1109/ICCV.2017.324
[33]Loshchilov I, Hutter F. Decoupled Weight Decay Regularization. arXiv preprint. 2019. doi: 10.48550/arXiv.1711.05101
[34]Padilla R, Passos, WL, Dias, TL, et al. A Comparative Analysis of Object Detection Metrics With a Companion Open-Source Toolkit. Electronics. 10(3): 279. doi: 10.3390/electronics10030279
[35]Lin TY, Dollar P, Girshick R, et al. Feature Pyramid Networks for Object Detection. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 21–26 July 2017; pp. 936–944. doi: 10.1109/CVPR.2017.106



