Multifidelity Bayesian optimization for hyperparameter tuning of deep reinforcement learning algorithms
Abstract
This research focuses on comparing standard Bayesian optimization and multifidelity Bayesian optimization in the hyperparameter search to improve the performance of reinforcement learning algorithms in environments such as OpenAI LunarLander and CartPole. The primary goal is to determine whether multifidelity Bayesian optimization provides significant improvements in solution quality compared to standard Bayesian optimization. To address this question, several Python implementations were developed, evaluating the solution quality using the mean of the total rewards obtained as the objective function. Various experiments were conducted for each environment and version using different seeds, ensuring that the results were not merely due to the inherent randomness of reinforcement learning algorithms. The results demonstrate that multifidelity Bayesian optimization outperforms standard Bayesian optimization in several key aspects. In the LunarLander environment, multifidelity optimization achieved better convergence and more stable performance, yielding a higher average reward compared to the standard version. In the CartPole environment, although both methods quickly reached the maximum reward, multifidelity did so with greater consistency and in less time. These findings highlight the ability of multifidelity optimization to optimize hyperparameters more efficiently, using fewer resources and less time while achieving superior performance.
Copyright (c) 2025 Author(s)

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
[1]Sutton RS, Barto AG. Reinforcement learning: An introduction, 2nd ed. MIT Press; 2018.
[2]Tang C, Abbatematteo B, Hu J, et al. Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes. Annual Review of Control, Robotics, and Autonomous Systems; 2024. doi: 10.1146/annurev-control-030323-022510
[3]Parker-Holder J, Rajan R, Song X, et al. Automated Reinforcement Learning (AutoRL): A Survey and Open Problems. Journal of Artificial Intelligence Research. 2022; 74: 517-568. doi: 10.1613/jair.1.13596
[4]Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature. 2015; 518(7540): 529-533. doi: 10.1038/nature14236
[5]Sahu SK, Mokhade A, Bokde ND. An Overview of Machine Learning, Deep Learning, and Reinforcement Learning-Based Techniques in Quantitative Finance: Recent Progress and Challenges. Applied Sciences. 2023; 13(3): 1956. doi: 10.3390/app13031956
[6]Gong S, Wang M, Gu B, et al. Bayesian Optimization Enhanced Deep Reinforcement Learning for Trajectory Planning and Network Formation in Multi-UAV Networks. IEEE Transactions on Vehicular Technology. 2023; 72(8): 10933-10948. doi: 10.1109/tvt.2023.3262778
[7]Yan Q, Wang H, Ma Y, et al. Uncertainty estimation in HDR imaging with Bayesian neural networks. Pattern Recognition. 2024; 156: 110802. doi: 10.1016/j.patcog.2024.110802
[8]Snoek J, Larochelle H, Adams RP. Practical bayesian optimization of machine learning algorithms. Advances in neural information processing systems; 2012.
[9]Li S, Su S, Lin X. Optimizing the hyper-parameters of deep reinforcement learning for building control. In: Building Simulation. Tsinghua University Press; 2025. doi: 10.1007/s12273-025-1233-y
[10]Kandasamy K, Schneider J, Póczos B. High dimensional Bayesian optimisation with multifidelity models. In: Proceedings of the 34th International Conference on Machine Learning; 2017.
[11]Lin Q, Hu J, Zhou Q, et al. A Multi-Fidelity Bayesian Optimization Approach for Constrained Multi-Objective Optimization Problems. Journal of Mechanical Design. 2024. doi: 10.1115/1.4064244
[12]Garrido Merchán EC. Advanced methods for bayesian optimization in complex scenarios [PhD thesis]. Universidad Autónoma de Madrid; 2021.
[13]Garrido-Merchán EC. Information-theoretic Bayesian Optimization: Survey and Tutorial. arXiv; 2025.
[14]Matsuo Y, LeCun Y, Sahani M, et al. Deep learning, reinforcement learning, and world models. Neural Networks. 2022; 152: 267-275. doi: 10.1016/j.neunet.2022.03.037
[15]Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms. arXiv; 2017.
[16]Hutter F, Hoos HH, Leyton-Brown K. Sequential model-based optimization for general algorithm configuration. In: Learning and intelligent optimization, Proceedings of the 5th international conference. Springer Berlin Heidelberg; 2011.
[17]Del Rio A, Jimenez D, Serrano J. Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments. IEEE Access. 2024; 12: 146795-146806. doi: 10.1109/access.2024.3472473
[18]Veeramanikandan K, Ponnusamy R. Deep-Q Classifier for Predicting Balanced and Imbalanced Features in Cartpole and Lunarlander Dataset. In: Proceedings of the 2023 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI); 2023. doi: 10.1109/icdsaai59313.2023.10452598
[19]Xiong Y, Hu Z, Huang Y, et al. XRL-Bench: A Benchmark for Evaluating and Comparing Explainable Reinforcement Learning Techniques. In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2024. doi: 10.1145/3637528.3671595
[20]Irshad F, Karsch S, Döpp A. Multi-objective and multi-fidelity Bayesian optimization of laser-plasma acceleration. Physical Review Research. 2023; 5(1). doi: 10.1103/physrevresearch.5.013063
[21]Winter JM, Abaidi R, Kaiser JWJ, et al. Multi-fidelity Bayesian optimization to solve the inverse Stefan problem. Computer Methods in Applied Mechanics and Engineering. 2023; 410: 115946. doi: 10.1016/j.cma.2023.115946
[22]Dayal A, Cenkeramaddi LR, Jha A. Reward criteria impact on the performance of reinforcement learning agent for autonomous navigation. Applied Soft Computing. 2022; 126: 109241. doi: 10.1016/j.asoc.2022.109241
[23]Folch JP, Lee RM, Shafei B, et al. Combining multi-fidelity modelling and asynchronous batch Bayesian Optimization. Computers & Chemical Engineering. 2023; 172: 108194. doi: 10.1016/j.compchemeng.2023.108194
[24]Shu L, Jiang P, Wang Y. A multi-fidelity Bayesian optimization approach based on the expected further improvement. Structural and Multidisciplinary Optimization. 2020; 63(4): 1709-1719. doi: 10.1007/s00158-020-02772-4
[25]Binois M, Wycoff N. A Survey on High-dimensional Gaussian Process Modeling with Application to Bayesian Optimization. ACM Transactions on Evolutionary Learning and Optimization. 2022; 2(2): 1-26. doi: 10.1145/3545611
[26]Ning Z, Xie L. A survey on multi-agent reinforcement learning and its application. Journal of Automation and Intelligence. 2024; 3(2): 73-91. doi: 10.1016/j.jai.2024.02.003
[27]de-la-Rica-Escudero A, Garrido-Merchán EC, Coronado-Vaca M. Explainable post hoc portfolio management financial policy of a Deep Reinforcement Learning agent. PLOS ONE. 2025; 20(1): e0315528. doi: 10.1371/journal.pone.0315528
[28]Moos J, Hansel K, Abdulsamad H, et al. Robust Reinforcement Learning: A Review of Foundations and Recent Advances. Machine Learning and Knowledge Extraction. 2022; 4(1): 276-315. doi: 10.3390/make4010013
[29]Ju H, Juan R, Gomez R, et al. Transferring policy of deep reinforcement learning from simulation to reality for robotics. Nature Machine Intelligence. 2022; 4(12): 1077-1087. doi: 10.1038/s42256-022-00573-6
[30]Zhu Z, Lin K, Jain AK, et al. Transfer Learning in Deep Reinforcement Learning: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023; 45(11): 13344-13362. doi: 10.1109/tpami.2023.3292075
[31]Jaderberg M, Dalibard V, Osindero S, et al. Population based training of neural networks. arXiv; 2017.
[32]Fernández-Sánchez D, Garrido-Merchán EC, Hernández-Lobato D. Alpha Entropy Search for New Information-based Bayesian Optimization. arXiv; 2024.
