Show simple item record

dc.contributor.advisorChoe, Yoonsuck
dc.creatorSavari, Maryam
dc.date.accessioned2023-05-26T18:02:55Z
dc.date.created2022-08
dc.date.issued2022-07-21
dc.date.submittedAugust 2022
dc.identifier.urihttps://hdl.handle.net/1969.1/197962
dc.description.abstractDeep Reinforcement Learning (DRL) algorithms are defined with fully continuous or discrete action spaces. These algorithms are widely used in autonomous driving due to their ability to cope with unseen environments. However, in a complex domain like autonomous driving, these algorithms need to explore the environment enough to converge. Among DRL algorithms, Soft Actor-Critic (SAC) is a powerful method capable of handling complex and continuous state-action spaces. However, long training time and data efficiency are the main drawbacks of this algorithm, even though SAC is robust for complex and dynamic environments. In addition, using deep RL algorithms in areas where safety is an essential factor, such as autonomous driving, can lead to a safety issue since we cannot leave the car driving in the street unattended. One of the proposed solutions to get around this issue is to utilize human feedback. In the first approach of this research, we tested two methods for the purpose of reducing the training time of the Soft Actor-Critic (SAC), using human feedback. First, we pre-trained SAC with Learning from Demonstrations (LfD) to find out if pre-training can reduce the training time of the SAC algorithm. Then, an online end-to-end combination method of SAC, LfD, Learning from Interventions (LfI), and imperfect demonstration was proposed to train an agent (dubbed Online Virtual Training). Both scenarios were implemented and tested in an inverted-pendulum task in OpenAI gym and autonomous driving in the CARLA simulator. The results showed a considerable reduction in the training time and a significant increase in gaining rewards for human demonstration and Online Virtual training compared to the baseline SAC. The proposed approach is expected to be effective in daily commute scenarios for autonomous driving, where the driver only needs to provide the required human feedback during the first few days of commute. In the second approach, we investigated different forms of human feedback: head direction vs. steering, and discrete vs. continuous feedback. To this end, a real-time human demonstration from steer and human head direction with discrete or continuous actions was employed as human feedback in an autonomous driving task in the CARLA simulator. In addition, we used alternating actions from a human expert and SAC to have a real-time human demonstration. Also, we tested the discrete vs. continuous feedback in an inverted pendulum task for precise experimental proof, with an ideal controller to simulate a human expert. The results showed a significant reduction in the training time and a significant increase in gained rewards for a combination of discrete feedback, as opposed to continuous feedback. It was also shown that head direction feedback can be almost as good as steering feedback. The main contribution of this work is in the investigation of different types of human intervention and feedback effects in combination with the SAC algorithm to make reinforcement learning safer and faster during the training time. We expect the proposed methods in this work to make Deep reinforcement learning algorithms more robust in challenging environments such as autonomous driving.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectDeep Reinforcement Learning
dc.subjectSoft Actor-Critic
dc.subjectContinuous Actions
dc.subjectDiscrete Actions
dc.subjectLearning from Demonstrations
dc.subjectLearning from Interventions
dc.subjectAutonomous Driving
dc.subjectInverted Pendulum
dc.titleUtilizing Human Feedback in the Soft Actor-Critic Algorithm for Autonomous Driving
dc.typeThesis
thesis.degree.departmentComputer Science and Engineering
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorTexas A&M University
thesis.degree.nameDoctor of Philosophy
thesis.degree.levelDoctoral
dc.contributor.committeeMemberKalathil, Dileep
dc.contributor.committeeMemberChaspari, Theodora
dc.contributor.committeeMemberSong, Dezhen
dc.type.materialtext
dc.date.updated2023-05-26T18:02:56Z
local.embargo.terms2024-08-01
local.embargo.lift2024-08-01
local.etdauthor.orcid0000-0003-2439-0405


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record