Приказ основних података о документу

dc.creatorDragičević, Aleksandra
dc.creatorKosić, Boris
dc.creatorJeli, Zorana
dc.date.accessioned2023-03-09T20:20:47Z
dc.date.available2023-03-09T20:20:47Z
dc.date.issued2018
dc.identifier.urihttps://machinery.mas.bg.ac.rs/handle/123456789/5606
dc.description.abstractReducing of the data dimensionality is necessary and required for optimal model performance in machine learning. Two different approaches are used in practice to solve this problem. The basic idea of the first one is to reduce dimensionality by removing highly correlated variables which implies that multiple variables measure same thing. It is done by removing all variables with high average correlation. In this way, variables are removed regardless to its significance to model accuracy, and as a result model accuracy can significantly drop. In the second one, for the better understanding of the data, relationship between variables and the model outcome, it is necessary to quantify variable impact on model outcome, and rank them according to these values. By using this approach, variables with lowest importance are removed from data set, and can lead to an increasing in the performance and accuracy of the final model. In the datasets with the highly correlated variables (e.g. sets of spectroscopy data), the most important variables can be with the highest average correlation, and after removing those variables the accuracy of the model can be significantly reduced. Based on the previous facts, the new method that used the most important variables with lowest correlation is proposed, as a combination of the previous two, and with this approach it is possible significantly to reduce dataset dimensionality where the variables have small correlation.sr
dc.language.isoensr
dc.rightsrestrictedAccesssr
dc.sourceCNN Tech 2018 "International Conference of Experimental and Numerical Investigations and New Technologies", Book of Abstractssr
dc.subjectMachine learningsr
dc.subjecthighly correlated datasr
dc.subjectvariable importancesr
dc.subjectdata dimensionalitysr
dc.titleThe New Method for Removing Highly Correlated Variables from Datasetssr
dc.typeconferenceObjectsr
dc.rights.licenseARRsr
dc.citation.rankM34
dc.citation.spage13-13
dc.identifier.rcubhttps://hdl.handle.net/21.15107/rcub_machinery_5606
dc.type.versionpublishedVersionsr


Документи

ДатотекеВеличинаФорматПреглед

Уз овај запис нема датотека.

Овај документ се појављује у следећим колекцијама

Приказ основних података о документу