The full text of this item is not available at this time because the student has placed this item under an embargo for a period of time. The Libraries are not authorized to provide a copy of this work during the embargo period, even for Texas A&M users with NetID.
Utilizing Human Feedback in the Soft Actor-Critic Algorithm for Autonomous Driving
Abstract
Deep Reinforcement Learning (DRL) algorithms are defined with fully continuous or discrete action spaces. These algorithms are widely used in autonomous driving due to their ability to cope with unseen environments. However, in a complex domain like autonomous driving, these algorithms need to explore the environment enough to converge. Among DRL algorithms, Soft Actor-Critic (SAC) is a powerful method capable of handling complex and continuous state-action spaces. However, long training time and data efficiency are the main drawbacks of this algorithm, even though SAC is robust for complex and dynamic environments. In addition, using deep RL algorithms in areas where safety is an essential factor, such as autonomous driving, can lead to a safety issue since we cannot leave the car driving in the street unattended. One of the proposed solutions to get around this issue is to utilize human feedback.
In the first approach of this research, we tested two methods for the purpose of reducing the training time of the Soft Actor-Critic (SAC), using human feedback. First, we pre-trained SAC with Learning from Demonstrations (LfD) to find out if pre-training can reduce the training time of the SAC algorithm. Then, an online end-to-end combination method of SAC, LfD, Learning from Interventions (LfI), and imperfect demonstration was proposed to train an agent (dubbed Online Virtual Training). Both scenarios were implemented and tested in an inverted-pendulum task in OpenAI gym and autonomous driving in the CARLA simulator. The results showed a considerable reduction in the training time and a significant increase in gaining rewards for human demonstration and Online Virtual training compared to the baseline SAC. The proposed approach is expected to be effective in daily commute scenarios for autonomous driving, where the driver only needs to provide the required human feedback during the first few days of commute.
In the second approach, we investigated different forms of human feedback: head direction vs. steering, and discrete vs. continuous feedback. To this end, a real-time human demonstration from steer and human head direction with discrete or continuous actions was employed as human feedback in an autonomous driving task in the CARLA simulator. In addition, we used alternating actions from a human expert and SAC to have a real-time human demonstration. Also, we tested the discrete vs. continuous feedback in an inverted pendulum task for precise experimental proof, with an ideal controller to simulate a human expert. The results showed a significant reduction in the training time and a significant increase in gained rewards for a combination of discrete feedback, as opposed to continuous feedback. It was also shown that head direction feedback can be almost as good as steering feedback.
The main contribution of this work is in the investigation of different types of human intervention and feedback effects in combination with the SAC algorithm to make reinforcement learning safer and faster during the training time. We expect the proposed methods in this work to make Deep reinforcement learning algorithms more robust in challenging environments such as autonomous driving.
Subject
Deep Reinforcement LearningSoft Actor-Critic
Continuous Actions
Discrete Actions
Learning from Demonstrations
Learning from Interventions
Autonomous Driving
Inverted Pendulum
Citation
Savari, Maryam (2022). Utilizing Human Feedback in the Soft Actor-Critic Algorithm for Autonomous Driving. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /197962.