Application of Q-Learning Controller for Processes with Dead Time
Więcej
Ukryj
1
Silesian University of Technology, Faculty of Automatic Control, Electronics and Computer Science, Department of Automatic Control and Robotics, Gliwice 44-100, Poland.
Autor do korespondencji
Jakub Musiał
Silesian University of Technology, Faculty of Automatic Control, Electronics and Computer Science, Department of Automatic Control and Robotics, Gliwice 44-100, Poland.
SŁOWA KLUCZOWE
DZIEDZINY
STRESZCZENIE
This paper presents an extension of a self-improving, model-free Q-learning controller for industrial processes characterized by significant dead time. While conventional Q-learning-based control approaches have demonstrated effectiveness for systems without delay, their direct application to time-delay processes is difficult by the mismatch between control actions and their delayed observable effects. To address this limitation, the proposed method introduces a modified Q-learning update mechanism based on FIFO buffers that delay Q-value updates in accordance with the process dead time, ensuring proper correlation between state–action pairs and resulting system responses. Additionally, a reward policy is reformulated for the delayed update structure to support stable and convergent learning.
The controller preserves key practical advantages of the original Q2d framework, including model-free operation, bumpless initialization from existing PI controller parameters, and the ability to learn online during normal operation without additionally applied external excitations. The approach is validated through simulation studies using the benchmark first-order plus dead time (FOPDT) processes with different delay times. Results demonstrate that the proposed method enables effective online performance improvement in setpoint tracking and disturbance rejection over a range of time delay value and for different accuracies of the delay time estimation.
Overall, the proposed modification extends the applicability of Q-learning-based control to a wider class of industrial processes with time delay, providing a practical possibility for applying reinforcement learning controllers in systems where transport delay is unavoidable.