Partial Generalized Additive Models: An Information-theoretic Approach for Selecting Variables and Avoiding Concurvity
主 题: Partial Generalized Additive Models: An Information-theoretic Approach for Selecting Variables and Avoiding Concurvity
报告人: Dr. Hong Gu (Dalhousie University)
时 间: 2009-01-14 上午 10:30 - 11:30
地 点: 理科一号楼 1418
Generalized additive models (GAM) are a class of multivariate nonparametric regression models which are potentially very useful data exploration tools for scientists in many fields. Scientists are often interested in finding which covariates in what way affect the response variable as opposed to just making predictions. Answering such questions often requires the input from both statistical modelling and background knowledge. GAM can be very useful in uncovering the covariates\\\' functional effects by properly controlling the smoothing parameters when covariates are all independent. However concurvity can cause serious difficulties. In particular, severe concurvities among potential covariates often lead to unstable or even wrong estimates of the covariates\\\' functional effects. In this paper, we develop a new procedure called partial generalized additive models (pGAM). pGAM sequentially maximizes the mutual information (MI) between the response variable and covariates and selects the optimal covariate to enter the model. Introduced by
Shannon, MI provides a good measure of nonlinear dependence between variables and can be viewed as a generalized, nonlinear version of Pearson\\\'s correlation coefficient. At each step, pGAM also removes any functional dependencies between remaining covariates and the ones already in the model, thereby allows users to observe a direct measure of the degree of concurvity and obtain more stable models. Plots of these functional dependencies between covariates provide insight into the structure of the concurvity, and allow the interpretation of the resulting model to be more precise. With simulation and a number of real-data xamples, we show that pGAM is a reasonable and meaningful variable selection procedure and gives much better estimates of the covariates\\\' functional effects.