Variable Selection in Regression using Maximal Correlation and Distance Correlation
Tarih ve Saat
9 Kasım 2016 - 10:00
Abstract: In most of the regression problems the first task is to select the most influential predictors explaining the response, and removing the others from the model. These problems are usually referred to as the variable selection problems in the statistical literature. Numerous methods have been proposed in this field, most of which address linear models. In this study we propose two variable selection criteria for regression based on two powerful dependence measures, maximal correlation and distance correlation. We focus on these two measures since they fully or partially satisfy the Renyi postulates for dependence measures, and thus they are able to detect nonlinear dependence structures. Therefore, our methods are considered to be appropriate in linear as well as nonlinear regression models. Both methods are easy to implement and they perform well. We illustrate the performances of the proposed methods via simulations, and compare them with two benchmark methods, stepwise AIC and lasso. In several cases with linear dependence all four methods turned out to be comparable. In the presence of nonlinear or uncorrelated dependencies, we observed that our proposed methods may be favorable. An application of the proposed methods to a real financial data set is also provided.
Biography: Deniz Yenigun is an Associate Professor of Statistics at Istanbul Bilgi University Faculty of Engineering and Natural Sciences (Istanbul, Turkey), where he teaches courses on probability, statistics, quality control, and reliability analysis. He received his Ph.D. in Statistics from Bowling Green State University (Bowling Green, Ohio, USA) in 2007, and worked as an Assistant Professor in the Faculty of Business Administration at Bilkent University (Ankara, Turkey) before joining Istanbul Bilgi University in 2014. His research interests include alternative correlation measures, variable selection in regression, reliability, survival analysis, and network analysis.