traffic light control using deep policy-gradient and value-function based reinforcement learning
Mousavi, Seyed Sajad ; schukat, Michael ; Howley, Enda
Mousavi, Seyed Sajad
schukat, Michael
Howley, Enda
Publication Date
2017-08-11
Keywords
gradient methods, learning (artificial intelligence), adaptive control, road traffic control, traffic engineering computing, control engineering computing, digital simulation, traffic light control, value-function-based reinforcement learning, deep neural network architectures, complex control problems, high-dimensional state space, action spaces, deep policy-gradient rl algorithm, value-function-based agent rl algorithms, traffic signal, traffic intersection, adaptive traffic light control agents, graphical traffic simulator, control signals, pg-based agent maps, optimal control, urban mobility traffic simulator, training process, signal control, function approximation, multiagent system, bottlenecks, algorithms, agent
Type
journal article
Downloads
Citation
Mousavi, Seyed Sajad; schukat, Michael; Howley, Enda (2017). traffic light control using deep policy-gradient and value-function based reinforcement learning . IET Intelligent Transport Systems 11 (7), 417-423
Abstract
Recent advances in combining deep neural network architectures with reinforcement learning (RL) techniques have shown promising potential results in solving complex control problems with high-dimensional state and action spaces. Inspired by these successes, in this study, the authors built two kinds of RL algorithms: deep policy-gradient (PG) and value-function-based agents which can predict the best possible traffic signal for a traffic intersection. At each time step, these adaptive traffic light control agents receive a snapshot of the current state of a graphical traffic simulator and produce control signals. The PG-based agent maps its observation directly to the control signal; however, the value-function-based agent first estimates values for all legal control signals. The agent then selects the optimal control action with the highest value. Their methods show promising results in a traffic network simulated in the simulation of urban mobility traffic simulator, without suffering from instability issues during the training process.
Funder
Publisher
Institution of Engineering and Technology (IET)