Show simple item record

dc.contributor.advisorMahapatra, Rabinarayan
dc.contributor.advisorSarin, Vivek
dc.creatorHooli, Mayuresh
dc.date.accessioned2020-08-26T16:50:11Z
dc.date.available2020-08-26T16:50:11Z
dc.date.created2019-12
dc.date.issued2019-12-04
dc.date.submittedDecember 2019
dc.identifier.urihttps://hdl.handle.net/1969.1/188746
dc.description.abstractInternet of Things (IoT) has become a game changer and has facilitated the creation of new ecosystems and business models that are impacting every aspect of human life today. AI has stamped itself as the key component of these emerging ecosystems. The data generated by these ecosystems and the machine learning algorithm together act as the brain through a centralized cloud model. However, to address the challenges of critical applications requiring low latency and to take advantage of private data, the machine learning is shifting from centralized cloud system to the distributed edge. The ML models are as good as the input data and hence, quality of data becomes the key success factor which facilitates the need for real time cleaning at the edge. The techniques used today require manual intervention to clean the data and the ones that are completely automated do not work efficiently. The two-phase process proposed in this research combined two different techniques that complement each other well to remove almost all the outliers. The first phase prepares a base for the second phase to avoid overfitting, while the second phase splits the data into subsets based on the trends to remove the outliers. The data is then imputed which gives us a near-perfect representation of the cleaned data in a completely automated way. We compare the two techniques we have derived through this technique with the standard algorithms and find that both these algorithms are a lot better than the standard algorithms. This is a univariate technique and can be transformed into a multivariate technique through an ensemble method with which we can clean the entire data set and get a better representation of the complete data. This technique is useful not just in the IoT domain but can also be used in the Telecom domain where data driven decisions at the Edge are becoming critical through the advent of 5G. This algorithm also facilitates auto-ML and federated learning.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectData Cleaningen
dc.subject5Gen
dc.subjectInternet of Thingsen
dc.subjectMachine Learningen
dc.subjectEdgeen
dc.titleData Cleaning on Edgeen
dc.typeThesisen
thesis.degree.departmentComputer Science and Engineeringen
thesis.degree.disciplineComputer Engineeringen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameMaster of Scienceen
thesis.degree.levelMastersen
dc.contributor.committeeMemberNarayanan, Krishna
dc.type.materialtexten
dc.date.updated2020-08-26T16:50:12Z
local.etdauthor.orcid0000-0003-4938-3447


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record