Utilizing Human Feedback in the Soft Actor-Critic Algorithm for Autonomous Driving

Savari, Maryam

The full text of this item is not available at this time because the student has placed this item under an embargo for a period of time. The Libraries are not authorized to provide a copy of this work during the embargo period, even for Texas A&M users with NetID.

Show simple item record

dc.contributor.advisor	Choe, Yoonsuck
dc.creator	Savari, Maryam
dc.date.accessioned	2023-05-26T18:02:55Z
dc.date.created	2022-08
dc.date.issued	2022-07-21
dc.date.submitted	August 2022
dc.identifier.uri	https://hdl.handle.net/1969.1/197962
dc.description.abstract	Deep Reinforcement Learning (DRL) algorithms are defined with fully continuous or discrete action spaces. These algorithms are widely used in autonomous driving due to their ability to cope with unseen environments. However, in a complex domain like autonomous driving, these algorithms need to explore the environment enough to converge. Among DRL algorithms, Soft Actor-Critic (SAC) is a powerful method capable of handling complex and continuous state-action spaces. However, long training time and data efficiency are the main drawbacks of this algorithm, even though SAC is robust for complex and dynamic environments. In addition, using deep RL algorithms in areas where safety is an essential factor, such as autonomous driving, can lead to a safety issue since we cannot leave the car driving in the street unattended. One of the proposed solutions to get around this issue is to utilize human feedback. In the first approach of this research, we tested two methods for the purpose of reducing the training time of the Soft Actor-Critic (SAC), using human feedback. First, we pre-trained SAC with Learning from Demonstrations (LfD) to find out if pre-training can reduce the training time of the SAC algorithm. Then, an online end-to-end combination method of SAC, LfD, Learning from Interventions (LfI), and imperfect demonstration was proposed to train an agent (dubbed Online Virtual Training). Both scenarios were implemented and tested in an inverted-pendulum task in OpenAI gym and autonomous driving in the CARLA simulator. The results showed a considerable reduction in the training time and a significant increase in gaining rewards for human demonstration and Online Virtual training compared to the baseline SAC. The proposed approach is expected to be effective in daily commute scenarios for autonomous driving, where the driver only needs to provide the required human feedback during the first few days of commute. In the second approach, we investigated different forms of human feedback: head direction vs. steering, and discrete vs. continuous feedback. To this end, a real-time human demonstration from steer and human head direction with discrete or continuous actions was employed as human feedback in an autonomous driving task in the CARLA simulator. In addition, we used alternating actions from a human expert and SAC to have a real-time human demonstration. Also, we tested the discrete vs. continuous feedback in an inverted pendulum task for precise experimental proof, with an ideal controller to simulate a human expert. The results showed a significant reduction in the training time and a significant increase in gained rewards for a combination of discrete feedback, as opposed to continuous feedback. It was also shown that head direction feedback can be almost as good as steering feedback. The main contribution of this work is in the investigation of different types of human intervention and feedback effects in combination with the SAC algorithm to make reinforcement learning safer and faster during the training time. We expect the proposed methods in this work to make Deep reinforcement learning algorithms more robust in challenging environments such as autonomous driving.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Deep Reinforcement Learning
dc.subject	Soft Actor-Critic
dc.subject	Continuous Actions
dc.subject	Discrete Actions
dc.subject	Learning from Demonstrations
dc.subject	Learning from Interventions
dc.subject	Autonomous Driving
dc.subject	Inverted Pendulum
dc.title	Utilizing Human Feedback in the Soft Actor-Critic Algorithm for Autonomous Driving
dc.type	Thesis
thesis.degree.department	Computer Science and Engineering
thesis.degree.discipline	Computer Engineering
thesis.degree.grantor	Texas A&M University
thesis.degree.name	Doctor of Philosophy
thesis.degree.level	Doctoral
dc.contributor.committeeMember	Kalathil, Dileep
dc.contributor.committeeMember	Chaspari, Theodora
dc.contributor.committeeMember	Song, Dezhen
dc.type.material	text
dc.date.updated	2023-05-26T18:02:56Z
local.embargo.terms	2024-08-01
local.embargo.lift	2024-08-01
local.etdauthor.orcid	0000-0003-2439-0405

Files in this item

Name:: SAVARI-DISSERTATION-2022.pdf
Size:: 5.226Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record