eprintid: 19022 rev_number: 2 eprint_status: archive userid: 1 dir: disk0/00/01/90/22 datestamp: 2024-06-04 14:11:28 lastmod: 2024-06-04 14:11:28 status_changed: 2024-06-04 14:04:39 type: article metadata_visibility: show creators_name: Sumiea, E.H.H. creators_name: Abdulkadir, S.J. creators_name: Ragab, M.G. creators_name: Al-Selwi, S.M. creators_name: Fati, S.M. creators_name: Alqushaibi, A. creators_name: Alhussian, H. title: Enhanced Deep Deterministic Policy Gradient Algorithm Using Grey Wolf Optimizer for Continuous Control Tasks ispublished: pub keywords: E-learning; Learning algorithms; Neural networks; Optimal control systems, Aerospace electronics; Deep deterministic policy gradient; Deep reinforcement learning; Deterministics; Game; Gray wolf optimization; Gray wolves; Hyper-parameter optimizations; Metaheuristic; Optimisations; Policy gradient; Reinforcement learnings; Task analysis, Reinforcement learning note: cited By 4 abstract: Deep Reinforcement Learning (DRL) allows agents to make decisions in a specific environment based on a reward function, without prior knowledge. Adapting hyperparameters significantly impacts the learning process and time. Precise estimation of hyperparameters during DRL training poses a major challenge. To tackle this problem, this study utilizes Grey Wolf Optimization (GWO), a metaheuristic algorithm, to optimize the hyperparameters of the Deep Deterministic Policy Gradient (DDPG) algorithm for achieving optimal control strategy in two simulated Gymnasium environments provided by OpenAI. The ability to adapt hyperparameters accurately contributes to faster convergence and enhanced learning, ultimately leading to more efficient control strategies. The proposed DDPG-GWO algorithm is evaluated in the 2DRobot and MountainCarContinuous simulation environments, chosen for their ease of implementation. Our experimental results reveal that optimizing the hyperparameters of the DDPG using the GWO algorithm in the Gymnasium environments maximizes the total rewards during testing episodes while ensuring the stability of the learning policy. This is evident in comparing our proposed DDPG-GWO agent with optimized hyperparameters and the original DDPG. In the 2DRobot environment, the original DDPG had rewards ranging from -150 to -50, whereas, in the proposed DDPG-GWO, they ranged from -100 to 100 with a running average between 1 and 800 across 892 episodes. In the MountainCarContinuous environment, the original DDPG struggled with negative rewards, while the proposed DDPG-GWO achieved rewards between 20 and 80 over 218 episodes with a total of 490 timesteps. © 2013 IEEE. date: 2023 publisher: Institute of Electrical and Electronics Engineers Inc. official_url: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85179805119&doi=10.1109%2fACCESS.2023.3341507&partnerID=40&md5=d5f7992e18d56c1c03f28cbfea55c188 id_number: 10.1109/ACCESS.2023.3341507 full_text_status: none publication: IEEE Access volume: 11 pagerange: 139771-139784 refereed: TRUE issn: 21693536 citation: Sumiea, E.H.H. and Abdulkadir, S.J. and Ragab, M.G. and Al-Selwi, S.M. and Fati, S.M. and Alqushaibi, A. and Alhussian, H. (2023) Enhanced Deep Deterministic Policy Gradient Algorithm Using Grey Wolf Optimizer for Continuous Control Tasks. IEEE Access, 11. pp. 139771-139784. ISSN 21693536