The New Method for Removing Highly Correlated Variables from Datasets

Dragičević, Aleksandra; Kosić, Boris; Jeli, Zorana

dc.creator	Dragičević, Aleksandra
dc.creator	Kosić, Boris
dc.creator	Jeli, Zorana
dc.date.accessioned	2023-03-09T20:20:47Z
dc.date.available	2023-03-09T20:20:47Z
dc.date.issued	2018
dc.identifier.uri	https://machinery.mas.bg.ac.rs/handle/123456789/5606
dc.description.abstract	Reducing of the data dimensionality is necessary and required for optimal model performance in machine learning. Two different approaches are used in practice to solve this problem. The basic idea of the first one is to reduce dimensionality by removing highly correlated variables which implies that multiple variables measure same thing. It is done by removing all variables with high average correlation. In this way, variables are removed regardless to its significance to model accuracy, and as a result model accuracy can significantly drop. In the second one, for the better understanding of the data, relationship between variables and the model outcome, it is necessary to quantify variable impact on model outcome, and rank them according to these values. By using this approach, variables with lowest importance are removed from data set, and can lead to an increasing in the performance and accuracy of the final model. In the datasets with the highly correlated variables (e.g. sets of spectroscopy data), the most important variables can be with the highest average correlation, and after removing those variables the accuracy of the model can be significantly reduced. Based on the previous facts, the new method that used the most important variables with lowest correlation is proposed, as a combination of the previous two, and with this approach it is possible significantly to reduce dataset dimensionality where the variables have small correlation.	sr
dc.language.iso	en	sr
dc.rights	restrictedAccess	sr
dc.source	CNN Tech 2018 "International Conference of Experimental and Numerical Investigations and New Technologies", Book of Abstracts	sr
dc.subject	Machine learning	sr
dc.subject	highly correlated data	sr
dc.subject	variable importance	sr
dc.subject	data dimensionality	sr
dc.title	The New Method for Removing Highly Correlated Variables from Datasets	sr
dc.type	conferenceObject	sr
dc.rights.license	ARR	sr
dc.citation.rank	M34
dc.citation.spage	13-13
dc.identifier.rcub	https://hdl.handle.net/21.15107/rcub_machinery_5606
dc.type.version	publishedVersion	sr

Документи

Датотеке	Величина	Формат	Преглед
Уз овај запис нема датотека.

Овај документ се појављује у следећим колекцијама

MF - radovi istraživača

Приказ основних података о документу